Introduction

In this notebook, we'll work on the Ames Housing Dataset available on Kaggle as an educational competition. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, the competition challenges us to predict the final price of each home. The competition dataset was compiled by Dean De Cock and it is an incredible alternative as a modernized and expanded version of the often cited Boston Housing dataset.

Our work will be to make predictions on the SalePrice of the houses in the dataset. We will train ML algorithms on the train dataset given by the competition and then make submissions on the predictions of SalePrice of houses in the test dataset. Our submissions will be evaluated by rmsle, and we'll try to improve on this metric with each of our submission.

Setup

First, we'll import the required libraries and get the file paths

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # data visualization
from sklearn.preprocessing import OneHotEncoder, Normalizer, RobustScaler # data preparation
from sklearn.impute import SimpleImputer # missing value handling
from sklearn.model_selection import KFold, cross_val_score # model selection
from sklearn.metrics import mean_squared_error # metrics
from scipy.stats import norm, skew # statistics
import psutil # get cpu core count
from bayes_opt import BayesianOptimization # hyperparameter tuning

# pipelines
from sklearn.compose import make_column_transformer 
from sklearn.pipeline import make_pipeline 

# ML models
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import LinearRegression, RidgeCV, LassoCV, ElasticNetCV
from sklearn.ensemble import RandomForestRegressor
import lightgbm as lgb
import xgboost as xgb

# ignore warnings
import warnings
warnings.filterwarnings("ignore")

# get file paths
import os
data_dir = os.getcwd()
for dirname, _, filenames in os.walk(data_dir):
    for filename in filenames:
        if filename[-4:] == '.csv':
            print(os.path.join(dirname, filename))
C:\Users\ncits\Downloads\Ames Housing\sample_submission.csv
C:\Users\ncits\Downloads\Ames Housing\test.csv
C:\Users\ncits\Downloads\Ames Housing\train.csv

All data files are under 1MB and safe to import wholly.

# import files
train = pd.read_csv(data_dir + "/train.csv", index_col = ["Id"])
test = pd.read_csv(data_dir + "/test.csv", index_col = ["Id"])
submission_df = pd.read_csv(data_dir + "/sample_submission.csv")
train.head()
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
Id
1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub Inside ... 0 NaN NaN NaN 0 2 2008 WD Normal 208500
2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub FR2 ... 0 NaN NaN NaN 0 5 2007 WD Normal 181500
3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub Inside ... 0 NaN NaN NaN 0 9 2008 WD Normal 223500
4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub Corner ... 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub FR2 ... 0 NaN NaN NaN 0 12 2008 WD Normal 250000

5 rows × 80 columns

Observations

  • data contains both numeric and categorical features.
  • SalePrice is the target column
# random state seed
seed = 42
# Separate target from features
X_train = train.copy()
y_train = X_train.pop("SalePrice")

EDA and Data Preparation

Now, we'll perform some basic data analysis and we'll use the insights we'll get to prepare the data for ML training.

Preliminary Analysis

pd.set_option("display.max_columns", None)
X_train.head(15)
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
Id
1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NaN Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal
2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 RFn 2 460 TA TA Y 298 0 0 0 0 0 NaN NaN NaN 0 5 2007 WD Normal
3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162.0 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001.0 RFn 2 608 TA TA Y 0 42 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal
4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0.0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998.0 Unf 3 642 TA TA Y 0 35 272 0 0 0 NaN NaN NaN 0 2 2006 WD Abnorml
5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350.0 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000.0 RFn 3 836 TA TA Y 192 84 0 0 0 0 NaN NaN NaN 0 12 2008 WD Normal
6 50 RL 85.0 14115 Pave NaN IR1 Lvl AllPub Inside Gtl Mitchel Norm Norm 1Fam 1.5Fin 5 5 1993 1995 Gable CompShg VinylSd VinylSd None 0.0 TA TA Wood Gd TA No GLQ 732 Unf 0 64 796 GasA Ex Y SBrkr 796 566 0 1362 1 0 1 1 1 1 TA 5 Typ 0 NaN Attchd 1993.0 Unf 2 480 TA TA Y 40 30 0 320 0 0 NaN MnPrv Shed 700 10 2009 WD Normal
7 20 RL 75.0 10084 Pave NaN Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 1Story 8 5 2004 2005 Gable CompShg VinylSd VinylSd Stone 186.0 Gd TA PConc Ex TA Av GLQ 1369 Unf 0 317 1686 GasA Ex Y SBrkr 1694 0 0 1694 1 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2004.0 RFn 2 636 TA TA Y 255 57 0 0 0 0 NaN NaN NaN 0 8 2007 WD Normal
8 60 RL NaN 10382 Pave NaN IR1 Lvl AllPub Corner Gtl NWAmes PosN Norm 1Fam 2Story 7 6 1973 1973 Gable CompShg HdBoard HdBoard Stone 240.0 TA TA CBlock Gd TA Mn ALQ 859 BLQ 32 216 1107 GasA Ex Y SBrkr 1107 983 0 2090 1 0 2 1 3 1 TA 7 Typ 2 TA Attchd 1973.0 RFn 2 484 TA TA Y 235 204 228 0 0 0 NaN NaN Shed 350 11 2009 WD Normal
9 50 RM 51.0 6120 Pave NaN Reg Lvl AllPub Inside Gtl OldTown Artery Norm 1Fam 1.5Fin 7 5 1931 1950 Gable CompShg BrkFace Wd Shng None 0.0 TA TA BrkTil TA TA No Unf 0 Unf 0 952 952 GasA Gd Y FuseF 1022 752 0 1774 0 0 2 0 2 2 TA 8 Min1 2 TA Detchd 1931.0 Unf 2 468 Fa TA Y 90 0 205 0 0 0 NaN NaN NaN 0 4 2008 WD Abnorml
10 190 RL 50.0 7420 Pave NaN Reg Lvl AllPub Corner Gtl BrkSide Artery Artery 2fmCon 1.5Unf 5 6 1939 1950 Gable CompShg MetalSd MetalSd None 0.0 TA TA BrkTil TA TA No GLQ 851 Unf 0 140 991 GasA Ex Y SBrkr 1077 0 0 1077 1 0 1 0 2 2 TA 5 Typ 2 TA Attchd 1939.0 RFn 1 205 Gd TA Y 0 4 0 0 0 0 NaN NaN NaN 0 1 2008 WD Normal
11 20 RL 70.0 11200 Pave NaN Reg Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 5 1965 1965 Hip CompShg HdBoard HdBoard None 0.0 TA TA CBlock TA TA No Rec 906 Unf 0 134 1040 GasA Ex Y SBrkr 1040 0 0 1040 1 0 1 0 3 1 TA 5 Typ 0 NaN Detchd 1965.0 Unf 1 384 TA TA Y 0 0 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal
12 60 RL 85.0 11924 Pave NaN IR1 Lvl AllPub Inside Gtl NridgHt Norm Norm 1Fam 2Story 9 5 2005 2006 Hip CompShg WdShing Wd Shng Stone 286.0 Ex TA PConc Ex TA No GLQ 998 Unf 0 177 1175 GasA Ex Y SBrkr 1182 1142 0 2324 1 0 3 0 4 1 Ex 11 Typ 2 Gd BuiltIn 2005.0 Fin 3 736 TA TA Y 147 21 0 0 0 0 NaN NaN NaN 0 7 2006 New Partial
13 20 RL NaN 12968 Pave NaN IR2 Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 6 1962 1962 Hip CompShg HdBoard Plywood None 0.0 TA TA CBlock TA TA No ALQ 737 Unf 0 175 912 GasA TA Y SBrkr 912 0 0 912 1 0 1 0 2 1 TA 4 Typ 0 NaN Detchd 1962.0 Unf 1 352 TA TA Y 140 0 0 0 176 0 NaN NaN NaN 0 9 2008 WD Normal
14 20 RL 91.0 10652 Pave NaN IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 7 5 2006 2007 Gable CompShg VinylSd VinylSd Stone 306.0 Gd TA PConc Gd TA Av Unf 0 Unf 0 1494 1494 GasA Ex Y SBrkr 1494 0 0 1494 0 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2006.0 RFn 3 840 TA TA Y 160 33 0 0 0 0 NaN NaN NaN 0 8 2007 New Partial
15 20 RL NaN 10920 Pave NaN IR1 Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 6 5 1960 1960 Hip CompShg MetalSd MetalSd BrkFace 212.0 TA TA CBlock TA TA No BLQ 733 Unf 0 520 1253 GasA TA Y SBrkr 1253 0 0 1253 1 0 1 1 2 1 TA 5 Typ 1 Fa Attchd 1960.0 RFn 1 352 TA TA Y 0 213 176 0 0 0 NaN GdWo NaN 0 5 2008 WD Normal
X_train.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1460 entries, 1 to 1460
Data columns (total 79 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   MSSubClass     1460 non-null   int64  
 1   MSZoning       1460 non-null   object 
 2   LotFrontage    1201 non-null   float64
 3   LotArea        1460 non-null   int64  
 4   Street         1460 non-null   object 
 5   Alley          91 non-null     object 
 6   LotShape       1460 non-null   object 
 7   LandContour    1460 non-null   object 
 8   Utilities      1460 non-null   object 
 9   LotConfig      1460 non-null   object 
 10  LandSlope      1460 non-null   object 
 11  Neighborhood   1460 non-null   object 
 12  Condition1     1460 non-null   object 
 13  Condition2     1460 non-null   object 
 14  BldgType       1460 non-null   object 
 15  HouseStyle     1460 non-null   object 
 16  OverallQual    1460 non-null   int64  
 17  OverallCond    1460 non-null   int64  
 18  YearBuilt      1460 non-null   int64  
 19  YearRemodAdd   1460 non-null   int64  
 20  RoofStyle      1460 non-null   object 
 21  RoofMatl       1460 non-null   object 
 22  Exterior1st    1460 non-null   object 
 23  Exterior2nd    1460 non-null   object 
 24  MasVnrType     1452 non-null   object 
 25  MasVnrArea     1452 non-null   float64
 26  ExterQual      1460 non-null   object 
 27  ExterCond      1460 non-null   object 
 28  Foundation     1460 non-null   object 
 29  BsmtQual       1423 non-null   object 
 30  BsmtCond       1423 non-null   object 
 31  BsmtExposure   1422 non-null   object 
 32  BsmtFinType1   1423 non-null   object 
 33  BsmtFinSF1     1460 non-null   int64  
 34  BsmtFinType2   1422 non-null   object 
 35  BsmtFinSF2     1460 non-null   int64  
 36  BsmtUnfSF      1460 non-null   int64  
 37  TotalBsmtSF    1460 non-null   int64  
 38  Heating        1460 non-null   object 
 39  HeatingQC      1460 non-null   object 
 40  CentralAir     1460 non-null   object 
 41  Electrical     1459 non-null   object 
 42  1stFlrSF       1460 non-null   int64  
 43  2ndFlrSF       1460 non-null   int64  
 44  LowQualFinSF   1460 non-null   int64  
 45  GrLivArea      1460 non-null   int64  
 46  BsmtFullBath   1460 non-null   int64  
 47  BsmtHalfBath   1460 non-null   int64  
 48  FullBath       1460 non-null   int64  
 49  HalfBath       1460 non-null   int64  
 50  BedroomAbvGr   1460 non-null   int64  
 51  KitchenAbvGr   1460 non-null   int64  
 52  KitchenQual    1460 non-null   object 
 53  TotRmsAbvGrd   1460 non-null   int64  
 54  Functional     1460 non-null   object 
 55  Fireplaces     1460 non-null   int64  
 56  FireplaceQu    770 non-null    object 
 57  GarageType     1379 non-null   object 
 58  GarageYrBlt    1379 non-null   float64
 59  GarageFinish   1379 non-null   object 
 60  GarageCars     1460 non-null   int64  
 61  GarageArea     1460 non-null   int64  
 62  GarageQual     1379 non-null   object 
 63  GarageCond     1379 non-null   object 
 64  PavedDrive     1460 non-null   object 
 65  WoodDeckSF     1460 non-null   int64  
 66  OpenPorchSF    1460 non-null   int64  
 67  EnclosedPorch  1460 non-null   int64  
 68  3SsnPorch      1460 non-null   int64  
 69  ScreenPorch    1460 non-null   int64  
 70  PoolArea       1460 non-null   int64  
 71  PoolQC         7 non-null      object 
 72  Fence          281 non-null    object 
 73  MiscFeature    54 non-null     object 
 74  MiscVal        1460 non-null   int64  
 75  MoSold         1460 non-null   int64  
 76  YrSold         1460 non-null   int64  
 77  SaleType       1460 non-null   object 
 78  SaleCondition  1460 non-null   object 
dtypes: float64(3), int64(33), object(43)
memory usage: 912.5+ KB

Observations

  • There are 1460 entries with 79 features.
  • Some features have null values, and thus will need further inspection.
  • Some features can be accumulated to give new features. For eg. totalling different types of rooms to give a feature total_rooms.

Null Values

# distribution of null values
plt.figure(figsize = (25, 10))
sns.heatmap(X_train.isnull(), yticklabels = "")
plt.title("Distribution of null values")
plt.show()
# count of null values
null_count = X_train.isnull().sum()
null_count = null_count.to_frame(name = "null_values")[null_count > 0]
null_count["null_percentage"] = null_count["null_values"]/len(X_train)*100
null_vals = null_count.sort_values(by = ["null_values"], ascending = False)

null_vals
null_values null_percentage
PoolQC 1453 99.520548
MiscFeature 1406 96.301370
Alley 1369 93.767123
Fence 1179 80.753425
FireplaceQu 690 47.260274
LotFrontage 259 17.739726
GarageType 81 5.547945
GarageYrBlt 81 5.547945
GarageFinish 81 5.547945
GarageQual 81 5.547945
GarageCond 81 5.547945
BsmtExposure 38 2.602740
BsmtFinType2 38 2.602740
BsmtFinType1 37 2.534247
BsmtCond 37 2.534247
BsmtQual 37 2.534247
MasVnrArea 8 0.547945
MasVnrType 8 0.547945
Electrical 1 0.068493

Observations

  • PoolQC, MiscFeature, Alley and Fence have a lot of null values.
  • Fortunately, the data description tells us that in all of the categorical feature columns bar one, null values indicate absence of those feature condition in the entries. Therefore, we'll impute null values in these feaure columns with 'None'.
  • Electrical is the only feature column where null values don't indicate absence of any condition. But because there is only 1 null entry, we can drop this entry.
  • With regards to numerical columns, MasVnrArea and LotFrontage can be filled with 0 because here too, null values indicate absence of the conditions. GarageYrBlt can be filled with the median value, because although it is linked with the GarageCond column, absence of garage cannot have year 0 in GarageYrBlt.
  • For newer null value columns in the test dataset, we'll choose to impute them with either most_frequent(Categorical) or median(Numeric).
# column dtypes
X_train.dtypes.value_counts()
object     43
int64      33
float64     3
dtype: int64

Handle Null values

null_cols = null_vals.index
null_cols = null_cols.drop("Electrical")


# drop null from `Electrical`
def drop_electrical_null(X, y):
    drop_idx = X[X["Electrical"].isnull()].index
    X = X.drop(drop_idx)
    y = y.drop(drop_idx)
    
    return X, y


# categorical columns
X_train['MSSubClass'] = X_train['MSSubClass'].apply(str)

cat_cols = X_train.select_dtypes(include = ["object"]).columns

cat_null_cols = null_cols.intersection(cat_cols)
cat_null_cols_imp = SimpleImputer(strategy = "constant", fill_value = "None")

cat_not_null_cols = cat_cols.difference(cat_null_cols)
cat_not_null_coll_imp = SimpleImputer(strategy = "most_frequent")

cat_null_ct = make_column_transformer((cat_null_cols_imp, cat_null_cols), (cat_not_null_coll_imp, cat_not_null_cols))


# numeric columns
num_cols = X_train.select_dtypes(exclude = ["object"]).columns

num_null0_cols = pd.Index(["MasVnrArea", "LotFrontage"])
num_null0_cols_imp = SimpleImputer(strategy = "constant", fill_value = 0)

num_not_null_cols = num_cols.difference(num_null0_cols)
num_not_null_cols_imp = SimpleImputer(strategy = "median")

num_null_ct = make_column_transformer((num_null0_cols_imp, num_null0_cols), (num_not_null_cols_imp, num_not_null_cols))


# combine both into a common transformer
null_ct = make_column_transformer((cat_null_ct, cat_cols), (num_null_ct, num_cols))
null_ct_features_in = cat_null_cols.append(cat_not_null_cols).append(num_null0_cols).append(num_not_null_cols)

Distribution and Outliers

# SalePrice (target)
sns.distplot(y_train, kde = True)
plt.title("Distribution of Sale Price")
plt.show()

The target looks skewed. We'll normalise it.

# normalized SalePrice (target)
y_train = np.log1p(y_train)
sns.distplot(y_train, fit = norm, kde = True)
plt.title("Distribution of Sale Price")
plt.show()

The target is normalized. Now, we can look at the numerical features.

plt.figure(figsize = (12,10))
fig = sns.boxplot(data = X_train[num_cols], orient = 'h')
fig.set_xscale("log")

There are quite a few outliers in the numeric columns. We'll need to scale numeric data using RobustScaler in the preprocessor building section. Also, many of the numeric features are skewed and will need to be normalized so that we can train ML models which assume normality in the features.

# features which deviate a lot from the normal curve
imputed_features = pd.DataFrame(num_null_ct.fit_transform(X_train[num_cols]))
num_features_in = num_null0_cols.append(num_not_null_cols)
imputed_features.columns = num_features_in

skew_features = imputed_features.apply(lambda x: skew(x)).sort_values(ascending = False)

high_skew = skew_features.loc[skew_features > 0.5]
skewed_index = high_skew.index
high_skew
MiscVal          24.451640
PoolArea         14.813135
LotArea          12.195142
3SsnPorch        10.293752
LowQualFinSF      9.002080
KitchenAbvGr      4.483784
BsmtFinSF2        4.250888
ScreenPorch       4.117977
BsmtHalfBath      4.099186
EnclosedPorch     3.086696
MasVnrArea        2.674865
OpenPorchSF       2.361912
BsmtFinSF1        1.683771
WoodDeckSF        1.539792
TotalBsmtSF       1.522688
1stFlrSF          1.375342
GrLivArea         1.365156
BsmtUnfSF         0.919323
2ndFlrSF          0.812194
OverallCond       0.692355
TotRmsAbvGrd      0.675646
HalfBath          0.675203
Fireplaces        0.648898
BsmtFullBath      0.595454
dtype: float64
# normalize the skewed features
norma = Normalizer()

normalized_cols = pd.DataFrame(norma.fit_transform(imputed_features[skewed_index]))
normalized_cols.columns = skewed_index
normalized_cols.head()
MiscVal PoolArea LotArea 3SsnPorch LowQualFinSF KitchenAbvGr BsmtFinSF2 ScreenPorch BsmtHalfBath EnclosedPorch MasVnrArea OpenPorchSF BsmtFinSF1 WoodDeckSF TotalBsmtSF 1stFlrSF GrLivArea BsmtUnfSF 2ndFlrSF OverallCond TotRmsAbvGrd HalfBath Fireplaces BsmtFullBath
0 0.0 0.0 0.962439 0.0 0.0 0.000114 0.0 0.0 0.000000 0.00000 0.022324 0.006948 0.080412 0.000000 0.097497 0.097497 0.194766 0.017085 0.097269 0.000569 0.000911 0.000114 0.000000 0.000114
1 0.0 0.0 0.969430 0.0 0.0 0.000101 0.0 0.0 0.000101 0.00000 0.000000 0.000000 0.098761 0.030093 0.127440 0.127440 0.127440 0.028679 0.000000 0.000808 0.000606 0.000000 0.000101 0.000000
2 0.0 0.0 0.976793 0.0 0.0 0.000087 0.0 0.0 0.000000 0.00000 0.014066 0.003647 0.042197 0.000000 0.079880 0.079880 0.155071 0.037683 0.075191 0.000434 0.000521 0.000087 0.000087 0.000087
3 0.0 0.0 0.971507 0.0 0.0 0.000102 0.0 0.0 0.000000 0.02767 0.000000 0.003560 0.021973 0.000000 0.076907 0.097761 0.174668 0.054933 0.076907 0.000509 0.000712 0.000000 0.000102 0.000102
4 0.0 0.0 0.977664 0.0 0.0 0.000069 0.0 0.0 0.000000 0.00000 0.023996 0.005759 0.044907 0.013163 0.078501 0.078501 0.150695 0.033594 0.072194 0.000343 0.000617 0.000069 0.000069 0.000069
plt.figure(figsize = (12, 10))
fig = sns.boxplot(data = normalized_cols, orient = "h")
fig.set_xscale("log")

Some features like MasVnrArea, OpenPorchSF, BsmtFinSF1, WoodDeckDF, 2ndFlrSF couldn't be normalized. These columns will thus need to be dropped after extracting important information from them, which we'll do in the feature engineering section.

Feature Engineering

Area related features are important in predicting sale price of houses, and thus we can engineer some features related to area. We can also add some binary features indicating presence of swimming pool, garage or fireplace, which also are an important determiners of real estate value.

def add_cols(df):
    
    # Add area related features
    df['TotalSF'] = df['TotalBsmtSF'] + df['1stFlrSF'] + df['2ndFlrSF']
    df['Total_Bathrooms'] = (df['FullBath'] + (0.5 * df['HalfBath']) + df['BsmtFullBath'] + (0.5 * df['BsmtHalfBath']))
    df['Total_porch_sf'] = (df['OpenPorchSF'] + df['3SsnPorch'] +
                                  df['EnclosedPorch'] + df['ScreenPorch'] +
                                  df['WoodDeckSF'])
    
    # Add simplified categorical features
    df['haspool'] = df['PoolArea'].apply(lambda x: 1 if x > 0 else 0)
    df['hasgarage'] = df['GarageArea'].apply(lambda x: 1 if x > 0 else 0)
    df['hasbsmt'] = df['TotalBsmtSF'].apply(lambda x: 1 if x > 0 else 0)
    df['hasfireplace'] = df['Fireplaces'].apply(lambda x: 1 if x > 0 else 0)
    
    df[["haspool", "hasgarage", "hasbsmt", "hasfireplace"]] = df[["haspool", "hasgarage", "hasbsmt", "hasfireplace"]].astype("object")
    
    return df

Prepare the data

Now, we will combine all we have learnt and done in the previous sections to process the data on which an ML algorithm can be trained. We'll

  • normalize target
  • impute null values
  • add features
  • drop features
  • onehotencode categorical features
  • normalize skewed numeric features
  • scale numeric features
# function to preprocess the data
def get_prepared_data(transform_numeric = True):
    X_trn = train.copy()
    y_trn = X_trn.pop("SalePrice")
    X_tst = test.copy()
    
    X_trn['MSSubClass'] = X_trn['MSSubClass'].astype("object")
    X_tst['MSSubClass'] = X_tst['MSSubClass'].astype("object")

    # normalize target
    y_trn = np.log1p(y_trn)
    
    # handle null values
    X_trn, y_trn = drop_electrical_null(X_trn, y_trn)
    X_trn = pd.DataFrame(null_ct.fit_transform(X_trn))
    X_tst = pd.DataFrame(null_ct.transform(X_tst))
    
    X_trn = X_trn.infer_objects()
    X_tst = X_tst.infer_objects()
    
    # re add column names
    X_trn.columns = null_ct_features_in
    X_tst.columns = null_ct_features_in
    
    X_trn['MSSubClass'] = X_trn['MSSubClass'].astype("object")
    X_tst['MSSubClass'] = X_tst['MSSubClass'].astype("object")
    
    # add features
    X_trn = add_cols(X_trn)
    X_tst = add_cols(X_tst)

    # drop features
    X_trn.drop(columns = ["MasVnrArea", "OpenPorchSF", "BsmtFinSF1", "WoodDeckSF", "2ndFlrSF"], inplace = True)
    X_tst.drop(columns = ["MasVnrArea", "OpenPorchSF", "BsmtFinSF1", "WoodDeckSF", "2ndFlrSF"], inplace = True)
    
    # categorical features
    cat_cols = X_trn.select_dtypes(include = ["object"]).columns
    cat_ohe = OneHotEncoder(drop = 'first', handle_unknown = 'ignore', sparse = False, dtype = 'uint8')

    # normalize numeric features
    num_cols = X_trn.select_dtypes(exclude = ["object"]).columns
    num_pipe = make_pipeline(Normalizer(), RobustScaler())
    
    # column transformer
    if transform_numeric:
        ct = make_column_transformer((cat_ohe, cat_cols), (num_pipe, num_cols))
    else:
        ct = make_column_transformer((cat_ohe, cat_cols), remainder = "passthrough")
        
    X_trn = pd.DataFrame(ct.fit_transform(X_trn))
    X_tst = pd.DataFrame(ct.transform(X_tst))

    return X_trn, y_trn, X_tst
# get the processed data
X_trn, y_trn, X_tst = get_prepared_data(True)

Now, we can move on to ML model training.

Train a Dummy model and Evaluate Performance

We'll train a dummy classifier to establish a baseline score. The future models should at least beat this scores. It helps to identify errors in training.

Dummy Model

# define model
dummy_model = DummyRegressor()
# model evaluation
def evaluate_model(model, X_trn = X_trn):
    cvs = cross_val_score(model, X_trn, y_trn, scoring = "neg_root_mean_squared_error")

    rmsle = -cvs.mean()
    
    print(f"Model RMSLE: {rmsle:.5f}")
    
evaluate_model(dummy_model)
Model RMSLE: 0.39936

Make submission

# train model
dummy_model.fit(X_trn, y_trn)

# make predictions
dummy_preds = dummy_model.predict(X_tst)
-cross_val_score(dummy_model, X_trn, y_trn, scoring = "neg_root_mean_squared_error").mean()
0.3993601707158806
# create submission file
submission_df["SalePrice"] = dummy_preds
submission_df.to_csv("dummy_model.csv", index = None)

The submission gives use a score of 0.42578. The future ML models should at least beat the cv rmsle score of 0.39936 and submission rmsle score of 0.42578.

Linear Models

First we'll train some linear models and compare their performance and make decision on training a final one.

# cv splitter
k_folds = KFold(5, shuffle = True, random_state = seed)

# parameters for cv
e_alphas = np.arange(0.0001, 0.0007, 0.0001)
e_l1ratio = np.arange(0.8, 1, 0.05)
alphas_ridge = np.arange(10, 16, 0.5)
alphas_lasso = np.arange(0.0001, 0.0008, 0.0001)
# linear models
ridge = RidgeCV(alphas = alphas_ridge, scoring = "neg_mean_squared_error", cv = k_folds)
lasso = LassoCV(alphas = alphas_lasso, max_iter = 1e6, cv = k_folds, n_jobs = -1, random_state = seed)
elastic_net = ElasticNetCV(l1_ratio = e_l1ratio, alphas = e_alphas, max_iter = 1e6, cv = k_folds, n_jobs = -1, random_state = seed)

models = {"Ridge": ridge,
         "Lasso": lasso, 
         "ElasticNet": elastic_net}
# compare linear models
scores = {}

for model_name, model in models.items():
    
    print(f"{model_name}:")   
    score = np.sqrt(-cross_val_score(model, X_trn, y_trn, scoring = "neg_mean_squared_error", cv = k_folds))
    print(score)
    print(f"RMSLE mean: {score.mean():.5f} \nRMSLE std: {score.std():.5f}")
    print("-" * 50)
    
    scores[model_name] = (score.mean(), score.std())
    
Ridge:
[0.13595902 0.15096778 0.13826833 0.13605681 0.12490943]
RMSLE mean: 0.13723 
RMSLE std: 0.00830
--------------------------------------------------
Lasso:
[0.13737072 0.15284838 0.13888969 0.13586582 0.12556263]
RMSLE mean: 0.13811 
RMSLE std: 0.00873
--------------------------------------------------
ElasticNet:
[0.13721347 0.15238829 0.13744699 0.13594653 0.12550764]
RMSLE mean: 0.13770 
RMSLE std: 0.00858
--------------------------------------------------

All these scores are improvements over baseline score and achieve rmsle scores in the range of 0.12-0.15. We can get a better score by blending all these linear models. Blending makes theses models complement each other and reduce their individual overfits.

Blended Model

%%time

print("training started...")

# train all models
ridge.fit(X_trn, y_trn)
lasso.fit(X_trn, y_trn)
elastic_net.fit(X_trn, y_trn)

print("training complete")
training started...
training complete
CPU times: total: 8.7 s
Wall time: 2.62 s

Make Submission

# make predictions
blended_preds = (ridge.predict(X_tst) + lasso.predict(X_tst) + elastic_net.predict(X_tst))/3
blended_preds = np.expm1(blended_preds)
# create submission file
submission_df["SalePrice"] = blended_preds
submission_df.to_csv("blended_linear.csv", index = None)

This submission from blended linear models give us a rmsle score of 0.13819 which is similar to the performance of a single lasso model. But we can improve more on this score by using gradient boosting trees, which we'll do in the next section.

Gradient Boosting Models

In this section, first we'll train a lightgbm model and then we'll train an xgboost model. Normalizing and scaling that we applied on numeric features earlier deteriorates the performance of gradient boosting trees, which can actually use the information lost through transformation. Therefore, To train these models, we'll reload the processed data, this time without transforming the numeric features.

# load processed data
X_trn, y_trn, X_tst = get_prepared_data(False)

LightGBM - Train and Evaluate

# get cpu core count
core_count = psutil.cpu_count(logical = False)
core_count
4
# lightgbm parameters
param = {"bagging_fraction": 0.8,
        "bagging_freq": 2, 
        "learning_rate": 0.01,
        "num_leaves": 10,
        "max_depth": 5,
        "min_data_in_leaf": 10,
        "metric": "rmse",
        "num_threads": core_count,
        "verbosity": -1}
# train and evaluate lightgbm
val_scores = []
i = 1

for trn_idx, val_idx in k_folds.split(X_trn, y_trn):

    print(f"Split {i}:")
    
    trn = lgb.Dataset(X_trn.iloc[trn_idx], y_trn.iloc[trn_idx])
    val = lgb.Dataset(X_trn.iloc[val_idx], y_trn.iloc[val_idx])
    
    bst = lgb.train(param, trn, num_boost_round = 3000, valid_sets = [trn, val], early_stopping_rounds = 10, verbose_eval = 50)
    
    score = bst.best_score["valid_1"]["rmse"]
    val_scores.append(score)
    
    print(f"RMSLE: {score:.5f}")
    print("-" * 65)
    
    i += 1

Split 1:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.2779	valid_1's rmse: 0.294501
[100]	training's rmse: 0.208419	valid_1's rmse: 0.22513
[150]	training's rmse: 0.168045	valid_1's rmse: 0.186805
[200]	training's rmse: 0.143163	valid_1's rmse: 0.164704
[250]	training's rmse: 0.126932	valid_1's rmse: 0.151274
[300]	training's rmse: 0.115682	valid_1's rmse: 0.14419
[350]	training's rmse: 0.10737	valid_1's rmse: 0.139166
[400]	training's rmse: 0.101333	valid_1's rmse: 0.135577
[450]	training's rmse: 0.0965005	valid_1's rmse: 0.133299
[500]	training's rmse: 0.0926217	valid_1's rmse: 0.131744
[550]	training's rmse: 0.0891204	valid_1's rmse: 0.130774
[600]	training's rmse: 0.0863924	valid_1's rmse: 0.12962
[650]	training's rmse: 0.0838749	valid_1's rmse: 0.129178
Early stopping, best iteration is:
[644]	training's rmse: 0.0840888	valid_1's rmse: 0.129117
RMSLE: 0.12912
-----------------------------------------------------------------
Split 2:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.273782	valid_1's rmse: 0.313671
[100]	training's rmse: 0.205466	valid_1's rmse: 0.244458
[150]	training's rmse: 0.165209	valid_1's rmse: 0.204575
[200]	training's rmse: 0.140476	valid_1's rmse: 0.181242
[250]	training's rmse: 0.124644	valid_1's rmse: 0.166237
[300]	training's rmse: 0.113833	valid_1's rmse: 0.157775
[350]	training's rmse: 0.105639	valid_1's rmse: 0.151577
[400]	training's rmse: 0.0995413	valid_1's rmse: 0.148238
[450]	training's rmse: 0.0946096	valid_1's rmse: 0.145141
Early stopping, best iteration is:
[484]	training's rmse: 0.0917289	valid_1's rmse: 0.144245
RMSLE: 0.14425
-----------------------------------------------------------------
Split 3:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.27963	valid_1's rmse: 0.281509
[100]	training's rmse: 0.208148	valid_1's rmse: 0.220384
[150]	training's rmse: 0.16596	valid_1's rmse: 0.188637
[200]	training's rmse: 0.139878	valid_1's rmse: 0.171665
[250]	training's rmse: 0.123095	valid_1's rmse: 0.162633
[300]	training's rmse: 0.11184	valid_1's rmse: 0.158664
[350]	training's rmse: 0.103994	valid_1's rmse: 0.156169
[400]	training's rmse: 0.0980702	valid_1's rmse: 0.154727
[450]	training's rmse: 0.0935271	valid_1's rmse: 0.153839
Early stopping, best iteration is:
[466]	training's rmse: 0.0922914	valid_1's rmse: 0.15353
RMSLE: 0.15353
-----------------------------------------------------------------
Split 4:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.279398	valid_1's rmse: 0.284018
[100]	training's rmse: 0.208896	valid_1's rmse: 0.220487
[150]	training's rmse: 0.16791	valid_1's rmse: 0.185005
[200]	training's rmse: 0.14287	valid_1's rmse: 0.165001
[250]	training's rmse: 0.126724	valid_1's rmse: 0.153032
[300]	training's rmse: 0.115538	valid_1's rmse: 0.145386
[350]	training's rmse: 0.107436	valid_1's rmse: 0.140329
[400]	training's rmse: 0.10126	valid_1's rmse: 0.136382
[450]	training's rmse: 0.096186	valid_1's rmse: 0.133607
[500]	training's rmse: 0.0921356	valid_1's rmse: 0.131694
[550]	training's rmse: 0.0886764	valid_1's rmse: 0.12993
[600]	training's rmse: 0.0858351	valid_1's rmse: 0.129052
Early stopping, best iteration is:
[593]	training's rmse: 0.0862509	valid_1's rmse: 0.129002
RMSLE: 0.12900
-----------------------------------------------------------------
Split 5:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.286016	valid_1's rmse: 0.2487
[100]	training's rmse: 0.21402	valid_1's rmse: 0.188862
[150]	training's rmse: 0.171467	valid_1's rmse: 0.156549
[200]	training's rmse: 0.145601	valid_1's rmse: 0.139452
[250]	training's rmse: 0.129239	valid_1's rmse: 0.130076
[300]	training's rmse: 0.117925	valid_1's rmse: 0.124245
[350]	training's rmse: 0.109824	valid_1's rmse: 0.120353
[400]	training's rmse: 0.103684	valid_1's rmse: 0.118195
[450]	training's rmse: 0.0986858	valid_1's rmse: 0.116684
[500]	training's rmse: 0.0947134	valid_1's rmse: 0.115513
[550]	training's rmse: 0.0912204	valid_1's rmse: 0.114181
[600]	training's rmse: 0.0884721	valid_1's rmse: 0.113515
Early stopping, best iteration is:
[632]	training's rmse: 0.0869814	valid_1's rmse: 0.113224
RMSLE: 0.11322
-----------------------------------------------------------------
# Avg RMSLE
np.mean(val_scores)
0.13382349293051415

Even without hyperparameter tuning, the validation scores are better than those of linear models. Now, we'll train on the whole dataset and make a submission.

trn = lgb.Dataset(X_trn, y_trn)

lgb_cv = lgb.cv(param, trn, num_boost_round = 3000, folds = k_folds, early_stopping_rounds = 10)

lgb_cv["rmse-mean"][-1]
0.13197370164429242
# train on full data
bst = lgb.train(param, trn, num_boost_round = len(lgb_cv["rmse-mean"]))

# make predictions
lgb_preds = np.expm1(bst.predict(X_tst))

Make submission

submission_df["SalePrice"] = lgb_preds

submission_df.to_csv("lgb.csv", index = None)

This submission gives us a score of 0.12901, which is an improvement over the last submission. Now, we can further optimize it with hyperparameter tuning.

LightGBM - Hyperparameter tuning

We'll use Bayesian Optimization to tune the hyperparameters in this section and then we'll make a submssion.

# black box function for Bayesian Optimization
def LGB_bayesian(bagging_fraction,
                bagging_freq,
                lambda_l1,
                lambda_l2,
                learning_rate,
                max_depth,
                min_data_in_leaf,
                min_gain_to_split,
                min_sum_hessian_in_leaf,  
                num_leaves,
                feature_fraction):
    
    # LightGBM expects these parameters to be integer. So we make them integer
    bagging_freq = int(bagging_freq)
    num_leaves = int(num_leaves)
    min_data_in_leaf = int(min_data_in_leaf)
    max_depth = int(max_depth)
    
    # parameters
    param = {'bagging_fraction': bagging_fraction,
            'bagging_freq': bagging_freq,
            'lambda_l1': lambda_l1,
            'lambda_l2': lambda_l2,
            'learning_rate': learning_rate,
            'max_depth': max_depth,
            'min_data_in_leaf': min_data_in_leaf,
            'min_gain_to_split': min_gain_to_split,
            'min_sum_hessian_in_leaf': min_sum_hessian_in_leaf,
            'num_leaves': num_leaves,
            'feature_fraction': feature_fraction,
            'seed': seed,
            'feature_fraction_seed': seed,
            'bagging_seed': seed,
            'drop_seed': seed,
            'boosting_type': 'gbdt',
            'metric': 'rmse',
            'verbosity': -1,
            'num_threads': core_count}
    
    trn = lgb.Dataset(X_trn, y_trn)
    
    lgb_cv = lgb.cv(param, trn, num_boost_round = 1500, folds = k_folds, stratified = False, early_stopping_rounds = 10, seed = seed)
    score = lgb_cv["rmse-mean"][-1]
    
    return 1/score
# parameter bounds
bounds_LGB = {
    'bagging_fraction': (0.5, 1),
    'bagging_freq': (1, 4),
    'lambda_l1': (0, 3.0), 
    'lambda_l2': (0, 3.0), 
    'learning_rate': (0.005, 0.3),
    'max_depth':(3,8),
    'min_data_in_leaf': (5, 20),  
    'min_gain_to_split': (0, 1),
    'min_sum_hessian_in_leaf': (0.01, 20),    
    'num_leaves': (5, 20),
    'feature_fraction': (0.05, 1)
}
# optimizer
LG_BO = BayesianOptimization(LGB_bayesian, bounds_LGB, random_state = seed)
# find the best hyperparameters
LG_BO.maximize(init_points = 10, n_iter = 200)

|   iter    |  target   | baggin... | baggin... | featur... | lambda_l1 | lambda_l2 | learni... | max_depth | min_da... | min_ga... | min_su... | num_le... |
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|  1        |  6.338    |  0.6873   |  3.852    |  0.7454   |  1.796    |  0.4681   |  0.05102  |  3.29     |  17.99    |  0.6011   |  14.16    |  5.309    |
|  2        |  6.913    |  0.985    |  3.497    |  0.2517   |  0.5455   |  0.5502   |  0.09475  |  5.624    |  11.48    |  0.2912   |  12.24    |  7.092    |
|  3        |  6.262    |  0.6461   |  2.099    |  0.4833   |  2.356    |  0.599    |  0.1567   |  5.962    |  5.697    |  0.6075   |  3.419    |  5.976    |
|  4        |  6.481    |  0.9744   |  3.897    |  0.818    |  0.9138   |  0.293    |  0.2068   |  5.201    |  6.831    |  0.4952   |  0.6974   |  18.64    |
|  5        |  6.091    |  0.6294   |  2.988    |  0.3461   |  1.56     |  1.64     |  0.05953  |  7.848    |  16.63    |  0.9395   |  17.9     |  13.97    |
|  6        |  7.011    |  0.9609   |  1.265    |  0.2362   |  0.1357   |  0.976    |  0.1197   |  4.357    |  17.43    |  0.3568   |  5.626    |  13.14    |
|  7        |  5.948    |  0.5705   |  3.407    |  0.1208   |  2.961    |  2.317    |  0.06362  |  3.028    |  17.23    |  0.7069   |  14.58    |  16.57    |
|  8        |  6.32     |  0.537    |  2.075    |  0.1601   |  2.589    |  1.87     |  0.1026   |  3.318    |  9.665    |  0.3252   |  14.59    |  14.56    |
|  9        |  6.456    |  0.9436   |  2.417    |  0.1636   |  2.14     |  2.282    |  0.1706   |  6.855    |  12.41    |  0.5227   |  8.557    |  5.381    |
|  10       |  6.361    |  0.5539   |  1.094    |  0.6546   |  0.9431   |  1.526    |  0.2727   |  4.246    |  11.16    |  0.7556   |  4.584    |  6.155    |
|  11       |  7.721    |  1.0      |  2.125    |  0.1412   |  0.0      |  0.0      |  0.06735  |  5.043    |  15.68    |  0.0      |  7.024    |  11.51    |
|  12       |  7.152    |  1.0      |  3.976    |  0.05     |  0.0      |  0.0      |  0.005    |  6.074    |  15.12    |  0.0      |  7.333    |  11.65    |
|  13       |  7.117    |  1.0      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  4.35     |  15.28    |  0.0      |  8.484    |  10.99    |
|  14       |  7.192    |  1.0      |  1.735    |  0.3443   |  0.0      |  0.0      |  0.3      |  5.697    |  15.98    |  0.0      |  5.747    |  10.45    |
|  15       |  7.484    |  1.0      |  2.551    |  0.1004   |  0.0      |  0.0      |  0.005    |  3.799    |  14.85    |  0.0      |  6.269    |  12.16    |
|  16       |  7.229    |  1.0      |  1.369    |  0.05     |  0.0      |  0.0      |  0.005    |  5.841    |  14.82    |  0.0      |  6.828    |  13.1     |
|  17       |  7.598    |  0.5      |  3.165    |  1.0      |  0.0      |  0.0      |  0.005    |  3.953    |  16.9     |  0.0      |  7.403    |  11.63    |
|  18       |  6.571    |  1.0      |  2.755    |  0.05     |  2.317    |  0.0      |  0.005    |  4.365    |  16.39    |  0.0      |  7.109    |  11.7     |
|  19       |  7.101    |  0.5      |  2.554    |  1.0      |  0.0      |  1.657    |  0.3      |  4.573    |  15.56    |  0.0      |  7.148    |  11.65    |
|  20       |  7.5      |  1.0      |  2.199    |  1.0      |  0.0      |  0.0      |  0.005    |  5.806    |  17.47    |  0.0      |  7.955    |  12.02    |
|  21       |  7.663    |  0.5      |  3.658    |  1.0      |  0.0      |  0.0      |  0.005    |  4.533    |  19.86    |  0.0      |  7.692    |  10.71    |
|  22       |  6.163    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.511    |  20.0     |  1.0      |  8.371    |  12.92    |
|  23       |  7.701    |  0.5      |  3.072    |  1.0      |  0.0      |  0.0      |  0.005    |  5.102    |  18.32    |  0.0      |  7.531    |  9.631    |
|  24       |  7.628    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  6.035    |  20.0     |  0.0      |  6.098    |  9.744    |
|  25       |  7.427    |  1.0      |  3.651    |  1.0      |  0.0      |  0.0      |  0.005    |  3.433    |  20.0     |  0.0      |  6.478    |  8.486    |
|  26       |  7.494    |  1.0      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  6.458    |  20.0     |  0.0      |  8.892    |  8.96     |
|  27       |  7.722    |  0.5      |  1.094    |  1.0      |  0.0      |  0.0      |  0.005    |  5.619    |  20.0     |  0.0      |  7.365    |  9.731    |
|  28       |  7.016    |  0.5      |  2.615    |  0.05     |  0.0      |  2.514    |  0.005    |  5.673    |  20.0     |  0.0      |  7.401    |  9.38     |
|  29       |  7.696    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  6.844    |  6.929    |
|  30       |  7.641    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  3.315    |  5.912    |
|  31       |  6.112    |  0.5      |  3.916    |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  1.0      |  5.31     |  5.0      |
|  32       |  7.713    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  4.486    |  9.099    |
|  33       |  7.736    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.4313   |  8.136    |
|  34       |  7.696    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  4.89     |  20.0     |  0.0      |  2.015    |  7.902    |
|  35       |  6.966    |  0.5      |  1.0      |  1.0      |  3.0      |  0.0      |  0.005    |  7.259    |  20.0     |  0.0      |  1.833    |  7.665    |
|  36       |  7.658    |  0.5      |  1.0      |  1.0      |  0.0      |  2.869    |  0.005    |  6.494    |  20.0     |  0.0      |  0.01     |  6.427    |
|  37       |  7.51     |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.559    |  20.0     |  0.0      |  0.01     |  5.0      |
|  38       |  7.705    |  0.5      |  1.0      |  1.0      |  0.0      |  2.56     |  0.005    |  6.076    |  20.0     |  0.0      |  0.01     |  10.63    |
|  39       |  7.714    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  12.97    |
|  40       |  7.618    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.622    |  20.0     |  0.0      |  0.01     |  11.08    |
|  41       |  7.543    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  13.77    |
|  42       |  7.713    |  0.5      |  1.0      |  1.0      |  0.0      |  0.647    |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  17.94    |
|  43       |  6.926    |  0.5      |  3.143    |  1.0      |  3.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  19.14    |
|  44       |  7.717    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  16.55    |  0.0      |  0.01     |  15.91    |
|  45       |  7.703    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  4.411    |  19.77    |  0.0      |  0.01     |  15.51    |
|  46       |  6.15     |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  6.572    |  18.47    |  1.0      |  0.01     |  16.1     |
|  47       |  7.585    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  13.47    |
|  48       |  7.685    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  15.03    |
|  49       |  7.52     |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  15.19    |  0.0      |  0.01     |  18.18    |
|  50       |  7.044    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  11.03    |
|  51       |  7.751    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  16.68    |  0.0      |  0.01     |  12.46    |
|  52       |  7.68     |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  9.493    |
|  53       |  7.652    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  2.526    |  12.02    |
|  54       |  7.379    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  4.384    |  15.97    |  0.0      |  0.01     |  15.01    |
|  55       |  6.92     |  0.5      |  1.0      |  1.0      |  3.0      |  3.0      |  0.005    |  8.0      |  19.11    |  0.0      |  0.01     |  12.79    |
|  56       |  7.122    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  3.681    |  20.0     |
|  57       |  6.481    |  0.9844   |  3.004    |  0.461    |  0.1131   |  0.5129   |  0.09865  |  7.624    |  17.4     |  0.9014   |  0.1111   |  9.671    |
|  58       |  7.112    |  1.0      |  1.0      |  1.0      |  3.0      |  0.0      |  0.3      |  8.0      |  5.0      |  0.0      |  20.0     |  5.0      |
|  59       |  7.555    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.979    |  20.0     |  0.0      |  0.01     |  8.798    |
|  60       |  7.585    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  18.13    |
|  61       |  6.396    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  5.0      |  1.0      |  10.39    |  20.0     |
|  62       |  7.206    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.3      |  3.0      |  20.0     |  0.0      |  2.043    |  5.0      |
|  63       |  7.122    |  0.5      |  1.0      |  0.05     |  0.0      |  3.0      |  0.3      |  3.0      |  15.63    |  0.0      |  0.01     |  20.0     |
|  64       |  7.759    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.28    |  0.0      |  0.01     |  14.36    |
|  65       |  6.188    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.3      |  8.0      |  12.68    |  0.0      |  0.01     |  16.17    |
|  66       |  7.683    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  16.53    |  0.0      |  1.6      |  15.2     |
|  67       |  6.994    |  1.0      |  1.0      |  1.0      |  3.0      |  0.0      |  0.3      |  3.0      |  20.0     |  0.0      |  0.01     |  18.09    |
|  68       |  7.713    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  9.356    |  9.508    |
|  69       |  6.969    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.3      |  5.246    |  20.0     |  0.0      |  1.954    |  15.77    |
|  70       |  6.964    |  0.5      |  1.0      |  1.0      |  3.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  8.347    |  8.476    |
|  71       |  7.497    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.7     |  0.0      |  0.01     |  11.39    |
|  72       |  7.27     |  0.7003   |  2.9      |  0.9973   |  0.1094   |  2.975    |  0.09622  |  3.103    |  5.03     |  0.1283   |  19.92    |  5.92     |
|  73       |  6.659    |  0.5      |  4.0      |  0.05     |  0.0      |  0.0      |  0.3      |  3.0      |  5.0      |  0.0      |  14.83    |  5.0      |
|  74       |  7.147    |  0.5      |  1.0      |  1.0      |  2.124    |  3.0      |  0.005    |  8.0      |  14.89    |  0.0      |  0.01     |  14.45    |
|  75       |  7.766    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.1     |  0.0      |  2.427    |  13.15    |
|  76       |  7.597    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.19     |  20.0     |  0.0      |  0.01     |  11.81    |
|  77       |  7.691    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  2.412    |  8.24     |
|  78       |  5.994    |  1.0      |  4.0      |  1.0      |  3.0      |  3.0      |  0.3      |  8.0      |  5.0      |  1.0      |  20.0     |  20.0     |
|  79       |  6.312    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.3      |  8.0      |  11.0     |  1.0      |  20.0     |  5.0      |
|  80       |  7.384    |  1.0      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  5.0      |  0.0      |  20.0     |  11.43    |
|  81       |  6.07     |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  5.0      |  1.0      |  20.0     |  10.16    |
|  82       |  7.696    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  11.45    |  6.089    |
|  83       |  6.986    |  1.0      |  1.0      |  1.0      |  3.0      |  0.0      |  0.005    |  3.0      |  5.0      |  0.0      |  20.0     |  9.878    |
|  84       |  6.952    |  0.5      |  1.0      |  0.05     |  3.0      |  0.0      |  0.3      |  8.0      |  20.0     |  0.0      |  20.0     |  5.0      |
|  85       |  6.961    |  1.0      |  1.0      |  0.05     |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  13.09    |  5.0      |
|  86       |  7.093    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.3      |  8.0      |  20.0     |  0.0      |  13.16    |  9.104    |
|  87       |  7.46     |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  5.0      |  0.0      |  0.01     |  11.73    |
|  88       |  7.469    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  5.0      |  0.0      |  0.01     |  12.26    |
|  89       |  6.317    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  5.0      |  1.0      |  0.01     |  13.96    |
|  90       |  6.38     |  0.5      |  4.0      |  0.05542  |  2.958    |  3.0      |  0.005    |  8.0      |  5.0      |  0.0      |  0.01     |  11.45    |
|  91       |  7.706    |  0.5      |  3.738    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.74    |  0.0      |  0.9225   |  13.42    |
|  92       |  7.185    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.3      |  5.567    |  20.0     |  0.0      |  9.453    |  6.733    |
|  93       |  7.481    |  1.0      |  2.642    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.85    |  0.0      |  2.701    |  13.31    |
|  94       |  7.577    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.931    |  11.69    |  0.0      |  0.01     |  12.14    |
|  95       |  6.869    |  0.5      |  4.0      |  0.05     |  0.0      |  3.0      |  0.005    |  3.0      |  5.0      |  0.0      |  6.951    |  12.17    |
|  96       |  6.988    |  1.0      |  4.0      |  1.0      |  3.0      |  3.0      |  0.3      |  3.0      |  10.07    |  0.0      |  0.01     |  10.4     |
|  97       |  7.409    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  5.0      |  0.0      |  0.01     |  9.278    |
|  98       |  6.397    |  0.5      |  1.611    |  1.0      |  0.0      |  3.0      |  0.005    |  5.695    |  12.98    |  1.0      |  1.15     |  13.16    |
|  99       |  7.153    |  0.5      |  2.703    |  1.0      |  0.0      |  1.932    |  0.3      |  6.105    |  20.0     |  0.0      |  2.283    |  9.836    |
|  100      |  7.564    |  1.0      |  1.0      |  1.0      |  0.0      |  1.399    |  0.005    |  8.0      |  18.0     |  0.0      |  0.01     |  14.38    |
|  101      |  7.541    |  1.0      |  1.0      |  1.0      |  0.0      |  2.267    |  0.005    |  8.0      |  17.94    |  0.0      |  0.01     |  17.56    |
|  102      |  7.714    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  3.316    |  12.91    |
|  103      |  7.532    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.19    |  0.0      |  3.933    |  16.08    |
|  104      |  7.661    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.81    |  0.0      |  0.01     |  11.59    |
|  105      |  7.555    |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  6.718    |  11.51    |
|  106      |  7.6      |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  6.529    |
|  107      |  6.931    |  0.5      |  4.0      |  1.0      |  3.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  5.0      |
|  108      |  7.155    |  1.0      |  1.0      |  0.05     |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  9.308    |
|  109      |  7.771    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  15.34    |  0.0      |  3.898    |  20.0     |
|  110      |  7.761    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.25    |  0.0      |  6.871    |  20.0     |
|  111      |  6.885    |  1.0      |  1.0      |  1.0      |  2.867    |  3.0      |  0.005    |  8.0      |  14.4     |  0.0      |  5.394    |  20.0     |
|  112      |  6.402    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  16.99    |  1.0      |  6.552    |  20.0     |
|  113      |  7.744    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.91    |  0.0      |  4.779    |  19.72    |
|  114      |  7.683    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  11.92    |  0.0      |  7.652    |  20.0     |
|  115      |  7.698    |  0.5      |  3.461    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.06    |  0.0      |  6.217    |  20.0     |
|  116      |  6.947    |  0.5      |  1.56     |  1.0      |  0.0      |  0.5025   |  0.3      |  8.0      |  13.04    |  0.0      |  6.418    |  20.0     |
|  117      |  7.743    |  0.5      |  1.839    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.97    |  0.0      |  6.914    |  17.71    |
|  118      |  7.721    |  0.5      |  1.572    |  1.0      |  0.0      |  3.0      |  0.005    |  5.583    |  13.05    |  0.0      |  6.423    |  19.37    |
|  119      |  7.707    |  0.5      |  3.264    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.21    |  0.0      |  2.684    |  20.0     |
|  120      |  7.446    |  1.0      |  3.063    |  1.0      |  0.0      |  3.0      |  0.005    |  6.612    |  13.0     |  0.0      |  9.332    |  19.46    |
|  121      |  7.679    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.69    |  0.0      |  4.214    |  17.12    |
|  122      |  7.637    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  4.938    |  13.18    |  0.0      |  4.092    |  20.0     |
|  123      |  6.861    |  0.5      |  4.0      |  0.05     |  0.0      |  3.0      |  0.005    |  3.0      |  11.14    |  0.0      |  6.848    |  20.0     |
|  124      |  7.72     |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.764    |  14.13    |  0.0      |  3.12     |  20.0     |
|  125      |  7.478    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.03    |  0.0      |  6.52     |  15.74    |
|  126      |  6.526    |  0.5      |  4.0      |  0.05     |  0.0      |  3.0      |  0.3      |  8.0      |  10.66    |  0.0      |  7.016    |  17.2     |
|  127      |  7.543    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  9.337    |  0.0      |  0.01     |  11.31    |
|  128      |  7.5      |  1.0      |  2.932    |  1.0      |  0.0      |  3.0      |  0.005    |  6.703    |  14.46    |  0.0      |  4.53     |  18.53    |
|  129      |  7.743    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.53    |  0.0      |  9.766    |  17.79    |
|  130      |  7.124    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  1.39     |  15.51    |
|  131      |  7.599    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  2.035    |  5.0      |
|  132      |  7.701    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  17.46    |  0.0      |  0.01     |  20.0     |
|  133      |  7.751    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.23    |  0.0      |  12.34    |  20.0     |
|  134      |  6.433    |  0.8563   |  1.084    |  0.8246   |  1.864    |  2.859    |  0.2361   |  7.893    |  11.11    |  0.5258   |  13.22    |  19.96    |
|  135      |  7.771    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  15.79    |  0.0      |  11.99    |  20.0     |
|  136      |  7.737    |  0.5      |  1.0      |  1.0      |  0.0      |  0.1906   |  0.005    |  8.0      |  14.74    |  0.0      |  12.11    |  20.0     |
|  137      |  7.155    |  1.0      |  1.0      |  1.0      |  0.0      |  1.618    |  0.3      |  8.0      |  15.76    |  0.0      |  14.82    |  20.0     |
|  138      |  7.69     |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.51    |  0.0      |  3.827    |  14.25    |
|  139      |  7.645    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  20.0     |  20.0     |
|  140      |  7.72     |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.964    |  14.37    |  0.0      |  10.68    |  20.0     |
|  141      |  6.144    |  0.703    |  3.009    |  0.9848   |  2.826    |  1.277    |  0.07818  |  4.065    |  19.83    |  0.578    |  19.77    |  19.91    |
|  142      |  7.613    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  11.17    |  0.0      |  1.945    |  20.0     |
|  143      |  7.005    |  0.5      |  1.0      |  0.05     |  0.0      |  3.0      |  0.005    |  8.0      |  14.04    |  0.0      |  10.12    |  20.0     |
|  144      |  7.735    |  0.5      |  1.0      |  1.0      |  0.0      |  2.229    |  0.005    |  6.85     |  14.55    |  0.0      |  12.14    |  18.01    |
|  145      |  7.53     |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.48    |  0.0      |  8.942    |  15.1     |
|  146      |  7.728    |  0.5      |  1.0      |  1.0      |  0.0      |  0.3714   |  0.005    |  4.877    |  14.62    |  0.0      |  11.85    |  20.0     |
|  147      |  7.726    |  0.5      |  1.0      |  1.0      |  0.0      |  0.7395   |  0.005    |  6.036    |  17.27    |  0.0      |  11.66    |  20.0     |
|  148      |  7.703    |  0.5      |  3.218    |  1.0      |  0.0      |  1.823    |  0.005    |  6.511    |  15.24    |  0.0      |  12.17    |  20.0     |
|  149      |  7.691    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.666    |  20.0     |  0.0      |  0.01     |  20.0     |
|  150      |  7.074    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.3      |  4.744    |  15.87    |  0.0      |  12.67    |  20.0     |
|  151      |  7.727    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.209    |  13.45    |  0.0      |  9.067    |  17.37    |
|  152      |  7.664    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.536    |  15.33    |  0.0      |  9.038    |  20.0     |
|  153      |  7.688    |  0.5      |  3.665    |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  17.47    |  0.0      |  11.57    |  20.0     |
|  154      |  6.824    |  0.8679   |  1.189    |  0.6865   |  1.241    |  0.2776   |  0.2391   |  7.836    |  17.99    |  0.3011   |  11.28    |  18.28    |
|  155      |  7.765    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.745    |  13.59    |  0.0      |  10.19    |  18.0     |
|  156      |  7.637    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  12.1     |  0.0      |  10.06    |  20.0     |
|  157      |  5.829    |  0.5      |  1.0      |  1.0      |  3.0      |  0.0      |  0.005    |  3.0      |  13.65    |  1.0      |  9.414    |  20.0     |
|  158      |  7.722    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  14.14    |  0.0      |  11.83    |  18.6     |
|  159      |  7.696    |  0.5      |  3.769    |  1.0      |  0.0      |  0.0      |  0.005    |  5.468    |  15.68    |  0.0      |  10.23    |  20.0     |
|  160      |  7.735    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  12.44    |  0.0      |  11.57    |  16.4     |
|  161      |  7.678    |  0.5      |  3.038    |  1.0      |  0.0      |  0.0      |  0.005    |  5.759    |  12.45    |  0.0      |  11.87    |  20.0     |
|  162      |  7.651    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  19.93    |  0.0      |  10.31    |  20.0     |
|  163      |  7.512    |  1.0      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.06    |  0.0      |  1.242    |  16.52    |
|  164      |  6.866    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  3.0      |  11.23    |  0.0      |  20.0     |  20.0     |
|  165      |  7.59     |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  4.937    |  20.0     |  0.0      |  11.45    |  20.0     |
|  166      |  7.716    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  9.52     |  0.0      |  10.1     |  13.67    |
|  167      |  7.762    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  7.836    |  0.0      |  9.655    |  11.97    |
|  168      |  7.466    |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  5.822    |  0.0      |  10.12    |  11.86    |
|  169      |  6.291    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.17    |  1.0      |  11.29    |  13.24    |
|  170      |  7.75     |  0.5      |  1.0      |  1.0      |  0.0      |  1.089    |  0.005    |  8.0      |  7.938    |  0.0      |  7.208    |  12.35    |
|  171      |  7.233    |  0.5      |  1.0      |  1.0      |  1.813    |  3.0      |  0.005    |  8.0      |  5.894    |  0.0      |  8.166    |  11.77    |
|  172      |  7.7      |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.798    |  8.182    |  0.0      |  8.413    |  14.1     |
|  173      |  7.713    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  10.43    |  0.0      |  7.339    |  13.99    |
|  174      |  7.71     |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  6.127    |  10.56    |  0.0      |  9.516    |  16.69    |
|  175      |  7.686    |  0.5      |  3.691    |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  8.28     |  0.0      |  8.72     |  12.18    |
|  176      |  7.045    |  0.5      |  1.0      |  1.0      |  2.637    |  0.0      |  0.005    |  8.0      |  8.554    |  0.0      |  8.412    |  13.73    |
|  177      |  7.633    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  8.996    |  0.0      |  4.923    |  11.66    |
|  178      |  7.69     |  0.5      |  3.174    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  8.15     |  0.0      |  7.805    |  10.37    |
|  179      |  7.589    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  18.62    |  0.0      |  8.369    |  20.0     |
|  180      |  7.134    |  0.5      |  2.777    |  1.0      |  0.0      |  0.0      |  0.3      |  3.0      |  17.39    |  0.0      |  11.15    |  20.0     |
|  181      |  7.701    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  8.916    |  0.0      |  8.803    |  10.23    |
|  182      |  7.734    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  6.595    |  0.0      |  10.12    |  8.767    |
|  183      |  5.831    |  0.5883   |  3.886    |  0.9875   |  2.478    |  2.413    |  0.1524   |  7.782    |  5.016    |  0.955    |  11.68    |  6.71     |
|  184      |  6.93     |  0.5      |  4.0      |  0.05     |  0.0      |  3.0      |  0.005    |  3.0      |  20.0     |  0.0      |  8.267    |  20.0     |
|  185      |  7.645    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  6.37     |  20.0     |
|  186      |  7.707    |  0.5      |  1.0      |  1.0      |  0.0      |  2.342    |  0.005    |  5.92     |  7.563    |  0.0      |  8.868    |  10.45    |
|  187      |  7.467    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  9.747    |  0.0      |  6.339    |  11.01    |
|  188      |  7.577    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  11.48    |  0.0      |  0.01     |  5.0      |
|  189      |  7.543    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  14.29    |  0.0      |  0.01     |  5.0      |
|  190      |  7.665    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  16.57    |  0.0      |  4.447    |  20.0     |
|  191      |  7.431    |  1.0      |  2.88     |  1.0      |  0.0      |  0.0      |  0.005    |  5.406    |  9.268    |  0.0      |  7.44     |  11.86    |
|  192      |  7.558    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  9.919    |  0.0      |  0.01     |  8.668    |
|  193      |  7.376    |  1.0      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  10.94    |  0.0      |  0.01     |  5.0      |
|  194      |  7.677    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.99    |  0.0      |  2.338    |  9.088    |
|  195      |  7.14     |  0.5      |  1.0      |  0.05     |  0.0      |  2.317    |  0.005    |  8.0      |  7.301    |  0.0      |  8.58     |  10.18    |
|  196      |  7.723    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.91     |  5.767    |  0.0      |  11.35    |  10.89    |
|  197      |  6.866    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.3      |  4.858    |  7.589    |  0.0      |  10.84    |  11.05    |
|  198      |  7.574    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  12.83    |  0.0      |  0.01     |  8.644    |
|  199      |  7.548    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  16.21    |  0.0      |  5.393    |  20.0     |
|  200      |  7.725    |  0.5      |  1.398    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  9.514    |  0.0      |  0.01     |  8.145    |
|  201      |  7.081    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.3      |  8.0      |  20.0     |  0.0      |  0.01     |  5.348    |
|  202      |  7.579    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  8.526    |  0.0      |  0.01     |  7.379    |
|  203      |  7.704    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  10.12    |  0.0      |  7.341    |  9.8      |
|  204      |  7.414    |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  5.0      |  0.0      |  6.01     |  14.66    |
|  205      |  7.69     |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.959    |  6.401    |  0.0      |  8.928    |  13.36    |
|  206      |  7.672    |  0.5      |  3.405    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  5.613    |  0.0      |  10.28    |  12.21    |
|  207      |  7.677    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  7.738    |  0.0      |  0.01     |  10.18    |
|  208      |  5.809    |  0.5      |  1.0      |  1.0      |  3.0      |  3.0      |  0.005    |  3.0      |  20.0     |  1.0      |  20.0     |  8.587    |
|  209      |  7.649    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.52    |  0.0      |  0.01     |  8.392    |
|  210      |  7.427    |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  2.523    |  20.0     |
=============================================================================================================================================================
# get the performance of best hyperparameters
tuned_lgbm_score = 1/LG_BO.max['target']
print(f"RMSLE of tuned lightgbm: {tuned_lgbm_score:.5f}")
RMSLE of tuned lightgbm: 0.12868
# best parameters
params = LG_BO.max["params"]

int_params = ["bagging_freq", "max_depth", "min_data_in_leaf", "num_leaves"]

for parameter in int_params:
    params[parameter] = int(params[parameter])
    
other_lgbm_params = {'seed': seed,
'feature_fraction_seed': seed,
'bagging_seed': seed,
'drop_seed': seed,
'boosting_type': 'gbdt',
'metric': 'rmse',
'verbosity': -1,
'num_threads': core_count}

params.update(other_lgbm_params)

params
{'bagging_fraction': 0.5,
 'bagging_freq': 1,
 'feature_fraction': 1.0,
 'lambda_l1': 0.0,
 'lambda_l2': 3.0,
 'learning_rate': 0.005,
 'max_depth': 8,
 'min_data_in_leaf': 15,
 'min_gain_to_split': 0.0,
 'min_sum_hessian_in_leaf': 3.8982726762658557,
 'num_leaves': 20,
 'seed': 42,
 'feature_fraction_seed': 42,
 'bagging_seed': 42,
 'drop_seed': 42,
 'boosting_type': 'gbdt',
 'metric': 'rmse',
 'verbosity': -1,
 'num_threads': 4}

Train and Make Submission

# get the num_boost_rounds
trn = lgb.Dataset(X_trn, y_trn)
lgb_cv = lgb.cv(params, trn, num_boost_round = 3000, folds = k_folds, early_stopping_rounds = 10)

num_boost_round = len(lgb_cv["rmse-mean"]) - 10
num_boost_round
1479
# train model
bst = lgb.train(params, trn, num_boost_round = num_boost_round)

# make predictions
lgb_preds = np.expm1(bst.predict(X_tst))
# create submission file
submission_df["SalePrice"] = lgb_preds
submission_df.to_csv("lgb_tuned.csv", index = None)

This submission gives us a score of 0.12715. The hyperparameter tuning helped to improve the performance of the lightgb model by a little bit. Now, we'll see how xgboost performs.

XGBoost - Train and Evaluate

# load the datasets into xgboost DMatrices
train_d = xgb.DMatrix(X_trn, y_trn)
test_d = xgb.DMatrix(X_tst)
# xgboost parameters
xgb_params = {"eta": 0.1, 
            "subsample": 0.7,
            "tree_method": "hist",
            "random_state": seed}
# train and evaluate xgboost
xgb_cv = xgb.cv(xgb_params, train_d, num_boost_round = 1500, nfold = 5, early_stopping_rounds = 10)

xgb_cv.tail()
train-rmse-mean train-rmse-std test-rmse-mean test-rmse-std
126 0.033865 0.001209 0.127286 0.018559
127 0.033507 0.001256 0.127247 0.018577
128 0.033214 0.001191 0.127263 0.018581
129 0.032872 0.001175 0.127245 0.018490
130 0.032607 0.001167 0.127240 0.018507

Even without hyperparameter tuning, xgboost is giving us a rmsle validation score of 0.127240. We can improve this score by hyperparameter tuning. First, we'll make predictions and a submission.

Make Submission

# train model
xgb_bst = xgb.train(xgb_params, train_d, num_boost_round = len(xgb_cv))

# make predictions
xgb_preds = np.expm1(xgb_bst.predict(test_d))
# create submission file
submission_df["SalePrice"] = xgb_preds
submission_df.to_csv("xgb_non_tuned.csv", index = None)

This submission gives a score of 0.13190, which looks like the model overfit a little bit. We can tune the hyperparameters and improve the performance.

XGBoost - Hyperparameter tuning

# black box function for Bayesian Optimization
def xgb_bayesian(eta,
                gamma,
                subsample,
                colsample_bytree,
                colsample_bynode,
                colsample_bylevel,
                max_depth):
    
    # this parameter has to an integer
    max_depth = int(max_depth)
    
    # xgboost parameters
    params = {"eta": eta,
             "gamma": gamma,
             "subsample": subsample,
             "colsample_bytree": colsample_bytree,
             "colsample_bynode": colsample_bynode,
             "colsample_bylevel": colsample_bylevel,
             "max_depth": max_depth,
             "tree_method": "hist"}
    
    # train and score
    xgb_cv = xgb.cv(params, train_d, num_boost_round = 1500, nfold =  5, early_stopping_rounds = 10, seed = seed)
    score = xgb_cv.iloc[-10]["test-rmse-mean"]
    
    return 1/score
# parameter bounds
xgb_bounds = {"eta": (0.01, 0.05),
             "gamma": (0, 20),
             "subsample": (0.4, 1),
             "colsample_bytree": (0.5, 1),
             "colsample_bynode": (0.5, 1),
             "colsample_bylevel": (0.5, 1),
             "max_depth": (2, 7)}
# optimizer
xgb_bo = BayesianOptimization(xgb_bayesian, xgb_bounds, random_state = seed)
# find the best hyperparameters
xgb_bo.maximize(init_points = 3, n_iter = 60)
|   iter    |  target   | colsam... | colsam... | colsam... |    eta    |   gamma   | max_depth | subsample |
-------------------------------------------------------------------------------------------------------------
|  1        |  5.275    |  0.6873   |  0.9754   |  0.866    |  0.03395  |  3.12     |  2.78     |  0.4349   |
|  2        |  3.383    |  0.9331   |  0.8006   |  0.854    |  0.01082  |  19.4     |  6.162    |  0.5274   |
|  3        |  4.52     |  0.5909   |  0.5917   |  0.6521   |  0.03099  |  8.639    |  3.456    |  0.7671   |
|  4        |  7.739    |  0.5      |  1.0      |  0.5      |  0.05     |  0.0      |  7.0      |  1.0      |
|  5        |  5.792    |  1.0      |  0.5      |  1.0      |  0.01     |  2.192    |  7.0      |  1.0      |
|  6        |  7.648    |  0.5      |  1.0      |  0.5      |  0.05     |  0.0      |  5.578    |  0.4      |
|  7        |  7.691    |  0.5      |  1.0      |  0.5      |  0.05     |  0.0      |  2.0      |  1.0      |
|  8        |  7.76     |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  3.967    |  1.0      |
|  9        |  7.818    |  0.5      |  0.5      |  1.0      |  0.05     |  0.0      |  5.085    |  1.0      |
|  10       |  7.422    |  0.7925   |  0.8704   |  0.985    |  0.04765  |  0.008645 |  2.932    |  0.7795   |
|  11       |  7.732    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  3.967    |  1.0      |
|  12       |  7.775    |  1.0      |  1.0      |  0.5      |  0.01     |  0.0      |  5.477    |  1.0      |
|  13       |  7.671    |  1.0      |  0.5      |  0.5      |  0.05     |  0.0      |  7.0      |  0.4      |
|  14       |  7.869    |  0.5      |  0.5      |  0.5      |  0.05     |  0.0      |  6.08     |  1.0      |
|  15       |  7.797    |  0.5413   |  0.7305   |  0.9571   |  0.0127   |  0.01644  |  6.651    |  0.9833   |
|  16       |  7.615    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  2.0      |  1.0      |
|  17       |  7.672    |  1.0      |  0.5      |  0.5      |  0.05     |  0.0      |  5.607    |  1.0      |
|  18       |  4.074    |  0.5      |  0.5      |  1.0      |  0.05     |  14.46    |  2.0      |  1.0      |
|  19       |  4.196    |  1.0      |  1.0      |  0.5      |  0.01     |  11.69    |  7.0      |  1.0      |
|  20       |  7.66     |  0.8068   |  0.9216   |  0.5596   |  0.01745  |  0.04023  |  4.768    |  0.995    |
|  21       |  3.822    |  0.5      |  0.5      |  0.5      |  0.05     |  20.0     |  2.0      |  1.0      |
|  22       |  7.696    |  0.5      |  1.0      |  1.0      |  0.01     |  0.0      |  5.865    |  1.0      |
|  23       |  7.522    |  0.7704   |  0.6646   |  0.5423   |  0.02316  |  0.07314  |  2.729    |  0.4044   |
|  24       |  7.616    |  0.5      |  0.5      |  1.0      |  0.01     |  0.0      |  2.0      |  1.0      |
|  25       |  7.566    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  2.0      |  0.4      |
|  26       |  7.887    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  7.0      |  0.4      |
|  27       |  6.545    |  0.5      |  0.5      |  0.5      |  0.01     |  0.8323   |  2.0      |  1.0      |
|  28       |  4.306    |  0.5      |  1.0      |  0.5      |  0.01     |  6.463    |  7.0      |  0.4      |
|  29       |  7.789    |  0.5      |  1.0      |  1.0      |  0.01     |  0.0      |  7.0      |  0.4      |
|  30       |  7.773    |  0.5007   |  0.5555   |  0.7216   |  0.03146  |  0.08095  |  6.635    |  0.5173   |
|  31       |  7.842    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  4.195    |  0.4      |
|  32       |  7.491    |  0.9942   |  0.5431   |  0.9432   |  0.04836  |  0.03342  |  3.756    |  0.4115   |
|  33       |  7.886    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  5.175    |  1.0      |
|  34       |  7.679    |  1.0      |  1.0      |  0.5      |  0.01     |  0.0      |  3.824    |  0.4      |
|  35       |  7.808    |  0.5      |  1.0      |  0.5      |  0.01     |  0.0      |  7.0      |  0.4      |
|  36       |  7.789    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  4.712    |  0.4      |
|  37       |  7.835    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  7.0      |  1.0      |
|  38       |  7.833    |  0.5      |  0.5      |  1.0      |  0.01     |  0.0      |  6.063    |  1.0      |
|  39       |  7.886    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  5.654    |  1.0      |
|  40       |  7.762    |  0.5803   |  0.505    |  0.5571   |  0.02706  |  0.004659 |  6.536    |  0.8425   |
|  41       |  7.706    |  0.5      |  0.5      |  1.0      |  0.05     |  0.0      |  7.0      |  0.4      |
|  42       |  7.044    |  0.5109   |  0.5865   |  0.7321   |  0.03324  |  0.4898   |  6.953    |  0.8459   |
|  43       |  4.434    |  1.0      |  0.5      |  1.0      |  0.01     |  6.029    |  2.0      |  0.4      |
|  44       |  7.363    |  1.0      |  1.0      |  1.0      |  0.01     |  0.0      |  2.0      |  0.4      |
|  45       |  7.829    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  5.106    |  0.4      |
|  46       |  7.809    |  0.5      |  0.5      |  1.0      |  0.01     |  0.0      |  5.964    |  0.4      |
|  47       |  7.689    |  1.0      |  0.5      |  1.0      |  0.01     |  0.0      |  4.909    |  0.4      |
|  48       |  7.76     |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  3.017    |  1.0      |
|  49       |  7.843    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  4.755    |  1.0      |
|  50       |  7.734    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  3.453    |  0.4      |
|  51       |  7.522    |  1.0      |  1.0      |  1.0      |  0.01     |  0.0      |  7.0      |  1.0      |
|  52       |  7.81     |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  6.282    |  0.4      |
|  53       |  7.699    |  1.0      |  0.5      |  0.5      |  0.05     |  0.0      |  4.654    |  0.4      |
|  54       |  3.392    |  0.5034   |  0.7218   |  0.5388   |  0.04978  |  15.4     |  6.853    |  0.4056   |
|  55       |  6.381    |  1.0      |  0.5      |  0.5      |  0.01     |  1.022    |  4.746    |  1.0      |
|  56       |  7.643    |  1.0      |  1.0      |  1.0      |  0.01     |  0.0      |  6.238    |  0.4      |
|  57       |  4.014    |  0.6623   |  0.9548   |  0.6727   |  0.0302   |  11.18    |  2.002    |  0.5538   |
|  58       |  7.491    |  0.5      |  1.0      |  0.5      |  0.01     |  0.0      |  2.789    |  1.0      |
|  59       |  7.747    |  0.9694   |  0.8278   |  0.8129   |  0.02669  |  0.0109   |  5.277    |  0.6284   |
|  60       |  7.651    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  3.44     |  0.4      |
|  61       |  7.858    |  0.5      |  0.8667   |  0.8305   |  0.01     |  0.0      |  6.538    |  0.4      |
|  62       |  7.788    |  1.0      |  1.0      |  0.5      |  0.01     |  0.0      |  5.181    |  0.4      |
|  63       |  7.683    |  0.5313   |  0.5206   |  0.7006   |  0.03925  |  0.000322 |  5.92     |  0.7606   |
=============================================================================================================
# get the performance of best hyperparameters
tuned_xgb_score = 1/xgb_bo.max['target']
print(f"RMSLE of tuned xgboost: {tuned_xgb_score:.5f}")
RMSLE of tuned xgboost: 0.12679
xgb_bo.max
{'target': 7.787494840784668,
 'params': {'colsample_bylevel': 0.5,
  'colsample_bynode': 0.5,
  'colsample_bytree': 0.5,
  'eta': 0.01,
  'gamma': 0.0,
  'max_depth': 4.676653597241575,
  'subsample': 1.0}}
# parameters
xgb_tuned_params = {"eta": 0.01,
                    "gamma": 0,
                    "subsample": 1.0,
                    "colsample_bytree": 0.5,
                    "colsample_bynode": 0.5,
                    "colsample_bylevel": 0.5,
                    "max_depth": 4,
                    "tree_method": "hist"}

Train and Make Submission

# get the num_boost_round
xgb_cv = xgb.cv(xgb_tuned_params, train_d, num_boost_round = 1500, nfold = 5, early_stopping_rounds = 10)

num_boost_round = len(xgb_cv) - 10

xgb_cv.tail()
train-rmse-mean train-rmse-std test-rmse-mean test-rmse-std
1495 0.063411 0.002336 0.121000 0.020686
1496 0.063391 0.002330 0.120999 0.020687
1497 0.063374 0.002329 0.120994 0.020690
1498 0.063356 0.002330 0.120985 0.020681
1499 0.063333 0.002331 0.120981 0.020679
# train model
bst = xgb.train(xgb_tuned_params, train_d, num_boost_round = num_boost_round)

# make predictions
xgb_preds = np.expm1(bst.predict(test_d))
# create submission file
submission_df["SalePrice"] = xgb_preds
submission_df.to_csv("xgb_tuned.csv", index = None)

This submission gives us the best score yet of rmsle 0.12488. This is our final submission.

Summary and Conclusion

In this project, we worked on the Ames housing data provided as part of a competition on Kaggle. We tried to predict the Sale Price of houses in Ames from this dataset. Before we could train ML models, we evaluated the data and prepared it for ML algorithms. We also trained a dummy model to spot errors in training. In the ML model training part, we first trained some linear models. Then we combined these linear models to form blended predictions. This improved the overall performance. Then we trained two gradient boosting models. First we trained a lightgbm model, which improved on the performance by blended linear models. Hyperparameter tuning also helped to further increase the submission score. Then we trained an XGBoost model, for which we also tuned its hyperparameters. This gave us the best rmsle score of 0.12488.