Summary: Used feature engineering with the Ames Housing Dataset to predict the expected value of houses in Ames, Iowa.
The Ames Housing Dataset is an exceptionally detailed and robust dataset with over 70 columns of different features relating to houses. I was tasked with creating two models to predict different targets, being a regression problem to predict the price of a house at sale, and a classification problem predicting whether a house sale was abnormal or not.
During my Data Analysis I determined that were 3 types of data. Nominal columns, representing categorical information, Ordinal columns representing rankings, and then purely numeric columns. I used those grouping to most efficiently transform the data into the most useful form.
With my data cleaned, I then proceeded to approach each problem by trying out many models using sci-kit learn’s GridSearch to find the best hyperparamters for each model. After determining which models worked best, which ended up being a GradientBoost for predicting house price, and a RandomForrest for determinging if the sale was abnormal.