R - UCI Adult Dataset Classification

This project implements the CRISP-DM methodology when applying classification models to the problem of identifying individuals whose salary exceeds a specified value based on demographic information such as age, level of education and current employment type. The process involved in the exploration, preparation, modelling and evaluation of the datasets are described. Topics such as the application of statistical analysis to suggest attribute usefulness, feature reduction, outlier detection, missing value management, data bias and data transformation is discussed. The process of relative performance analysis of the proposed classifiers is reviewed. The support of a business objective which will use the predictive capabilities of the proposed models to target customers is reviewed including the use of lift analysis to indicate the likely level of return on investment and overall profitability.

Problem statement :

Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset. i.e. Prediction task is to determine whether a person makes over 50K a year.

Data set attribute information:

  • age: continuous.

  • workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.

  • fnlwgt: continuous.

  • education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.

  • education-num: continuous.

  • marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.

  • occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.

  • relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.

  • race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.

  • sex: Female, Male.

  • capital-gain: continuous.

  • capital-loss: continuous.

  • hours-per-week: continuous.

  • native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, Frane, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.


Project Report & Links

I have uploaded code and project report to my GitHub Account.