Introduction

In this notebook, we'll work on the Ames Housing Dataset available on Kaggle as an educational competition. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, the competition challenges us to predict the final price of each home. The competition dataset was compiled by Dean De Cock and it is an incredible alternative as a modernized and expanded version of the often cited Boston Housing dataset.

Our work will be to make predictions on the SalePrice of the houses in the dataset. We will train ML algorithms on the train dataset given by the competition and then make submissions on the predictions of SalePrice of houses in the test dataset. Our submissions will be evaluated by rmsle, and we'll try to improve on this metric with each of our submission.

Setup

First, we'll import the required libraries and get the file paths

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # data visualization
from sklearn.preprocessing import OneHotEncoder, Normalizer, RobustScaler # data preparation
from sklearn.impute import SimpleImputer # missing value handling
from sklearn.model_selection import KFold, cross_val_score # model selection
from sklearn.metrics import mean_squared_error # metrics
from scipy.stats import norm, skew # statistics
import psutil # get cpu core count
from bayes_opt import BayesianOptimization # hyperparameter tuning

# pipelines
from sklearn.compose import make_column_transformer 
from sklearn.pipeline import make_pipeline 

# ML models
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import LinearRegression, RidgeCV, LassoCV, ElasticNetCV
from sklearn.ensemble import RandomForestRegressor
import lightgbm as lgb
import xgboost as xgb

# ignore warnings
import warnings
warnings.filterwarnings("ignore")

# get file paths
import os
data_dir = os.getcwd()
for dirname, _, filenames in os.walk(data_dir):
    for filename in filenames:
        if filename[-4:] == '.csv':
            print(os.path.join(dirname, filename))

C:\Users\ncits\Downloads\Ames Housing\sample_submission.csv
C:\Users\ncits\Downloads\Ames Housing\test.csv
C:\Users\ncits\Downloads\Ames Housing\train.csv

All data files are under 1MB and safe to import wholly.

# import files
train = pd.read_csv(data_dir + "/train.csv", index_col = ["Id"])
test = pd.read_csv(data_dir + "/test.csv", index_col = ["Id"])
submission_df = pd.read_csv(data_dir + "/sample_submission.csv")

train.head()

Observations

data contains both numeric and categorical features.
SalePrice is the target column

# random state seed
seed = 42

# Separate target from features
X_train = train.copy()
y_train = X_train.pop("SalePrice")

EDA and Data Preparation

Now, we'll perform some basic data analysis and we'll use the insights we'll get to prepare the data for ML training.

Preliminary Analysis

pd.set_option("display.max_columns", None)
X_train.head(15)

X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1460 entries, 1 to 1460
Data columns (total 79 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   MSSubClass     1460 non-null   int64  
 1   MSZoning       1460 non-null   object 
 2   LotFrontage    1201 non-null   float64
 3   LotArea        1460 non-null   int64  
 4   Street         1460 non-null   object 
 5   Alley          91 non-null     object 
 6   LotShape       1460 non-null   object 
 7   LandContour    1460 non-null   object 
 8   Utilities      1460 non-null   object 
 9   LotConfig      1460 non-null   object 
 10  LandSlope      1460 non-null   object 
 11  Neighborhood   1460 non-null   object 
 12  Condition1     1460 non-null   object 
 13  Condition2     1460 non-null   object 
 14  BldgType       1460 non-null   object 
 15  HouseStyle     1460 non-null   object 
 16  OverallQual    1460 non-null   int64  
 17  OverallCond    1460 non-null   int64  
 18  YearBuilt      1460 non-null   int64  
 19  YearRemodAdd   1460 non-null   int64  
 20  RoofStyle      1460 non-null   object 
 21  RoofMatl       1460 non-null   object 
 22  Exterior1st    1460 non-null   object 
 23  Exterior2nd    1460 non-null   object 
 24  MasVnrType     1452 non-null   object 
 25  MasVnrArea     1452 non-null   float64
 26  ExterQual      1460 non-null   object 
 27  ExterCond      1460 non-null   object 
 28  Foundation     1460 non-null   object 
 29  BsmtQual       1423 non-null   object 
 30  BsmtCond       1423 non-null   object 
 31  BsmtExposure   1422 non-null   object 
 32  BsmtFinType1   1423 non-null   object 
 33  BsmtFinSF1     1460 non-null   int64  
 34  BsmtFinType2   1422 non-null   object 
 35  BsmtFinSF2     1460 non-null   int64  
 36  BsmtUnfSF      1460 non-null   int64  
 37  TotalBsmtSF    1460 non-null   int64  
 38  Heating        1460 non-null   object 
 39  HeatingQC      1460 non-null   object 
 40  CentralAir     1460 non-null   object 
 41  Electrical     1459 non-null   object 
 42  1stFlrSF       1460 non-null   int64  
 43  2ndFlrSF       1460 non-null   int64  
 44  LowQualFinSF   1460 non-null   int64  
 45  GrLivArea      1460 non-null   int64  
 46  BsmtFullBath   1460 non-null   int64  
 47  BsmtHalfBath   1460 non-null   int64  
 48  FullBath       1460 non-null   int64  
 49  HalfBath       1460 non-null   int64  
 50  BedroomAbvGr   1460 non-null   int64  
 51  KitchenAbvGr   1460 non-null   int64  
 52  KitchenQual    1460 non-null   object 
 53  TotRmsAbvGrd   1460 non-null   int64  
 54  Functional     1460 non-null   object 
 55  Fireplaces     1460 non-null   int64  
 56  FireplaceQu    770 non-null    object 
 57  GarageType     1379 non-null   object 
 58  GarageYrBlt    1379 non-null   float64
 59  GarageFinish   1379 non-null   object 
 60  GarageCars     1460 non-null   int64  
 61  GarageArea     1460 non-null   int64  
 62  GarageQual     1379 non-null   object 
 63  GarageCond     1379 non-null   object 
 64  PavedDrive     1460 non-null   object 
 65  WoodDeckSF     1460 non-null   int64  
 66  OpenPorchSF    1460 non-null   int64  
 67  EnclosedPorch  1460 non-null   int64  
 68  3SsnPorch      1460 non-null   int64  
 69  ScreenPorch    1460 non-null   int64  
 70  PoolArea       1460 non-null   int64  
 71  PoolQC         7 non-null      object 
 72  Fence          281 non-null    object 
 73  MiscFeature    54 non-null     object 
 74  MiscVal        1460 non-null   int64  
 75  MoSold         1460 non-null   int64  
 76  YrSold         1460 non-null   int64  
 77  SaleType       1460 non-null   object 
 78  SaleCondition  1460 non-null   object 
dtypes: float64(3), int64(33), object(43)
memory usage: 912.5+ KB

Observations

There are 1460 entries with 79 features.
Some features have null values, and thus will need further inspection.
Some features can be accumulated to give new features. For eg. totalling different types of rooms to give a feature total_rooms.

Null Values

# distribution of null values
plt.figure(figsize = (25, 10))
sns.heatmap(X_train.isnull(), yticklabels = "")
plt.title("Distribution of null values")
plt.show()

# count of null values
null_count = X_train.isnull().sum()
null_count = null_count.to_frame(name = "null_values")[null_count > 0]
null_count["null_percentage"] = null_count["null_values"]/len(X_train)*100
null_vals = null_count.sort_values(by = ["null_values"], ascending = False)

null_vals

Observations

PoolQC, MiscFeature, Alley and Fence have a lot of null values.
Fortunately, the data description tells us that in all of the categorical feature columns bar one, null values indicate absence of those feature condition in the entries. Therefore, we'll impute null values in these feaure columns with 'None'.
Electrical is the only feature column where null values don't indicate absence of any condition. But because there is only 1 null entry, we can drop this entry.
With regards to numerical columns, MasVnrArea and LotFrontage can be filled with 0 because here too, null values indicate absence of the conditions. GarageYrBlt can be filled with the median value, because although it is linked with the GarageCond column, absence of garage cannot have year 0 in GarageYrBlt.
For newer null value columns in the test dataset, we'll choose to impute them with either most_frequent(Categorical) or median(Numeric).

# column dtypes
X_train.dtypes.value_counts()

object     43
int64      33
float64     3
dtype: int64

Handle Null values

null_cols = null_vals.index
null_cols = null_cols.drop("Electrical")


# drop null from `Electrical`
def drop_electrical_null(X, y):
    drop_idx = X[X["Electrical"].isnull()].index
    X = X.drop(drop_idx)
    y = y.drop(drop_idx)
    
    return X, y


# categorical columns
X_train['MSSubClass'] = X_train['MSSubClass'].apply(str)

cat_cols = X_train.select_dtypes(include = ["object"]).columns

cat_null_cols = null_cols.intersection(cat_cols)
cat_null_cols_imp = SimpleImputer(strategy = "constant", fill_value = "None")

cat_not_null_cols = cat_cols.difference(cat_null_cols)
cat_not_null_coll_imp = SimpleImputer(strategy = "most_frequent")

cat_null_ct = make_column_transformer((cat_null_cols_imp, cat_null_cols), (cat_not_null_coll_imp, cat_not_null_cols))


# numeric columns
num_cols = X_train.select_dtypes(exclude = ["object"]).columns

num_null0_cols = pd.Index(["MasVnrArea", "LotFrontage"])
num_null0_cols_imp = SimpleImputer(strategy = "constant", fill_value = 0)

num_not_null_cols = num_cols.difference(num_null0_cols)
num_not_null_cols_imp = SimpleImputer(strategy = "median")

num_null_ct = make_column_transformer((num_null0_cols_imp, num_null0_cols), (num_not_null_cols_imp, num_not_null_cols))


# combine both into a common transformer
null_ct = make_column_transformer((cat_null_ct, cat_cols), (num_null_ct, num_cols))
null_ct_features_in = cat_null_cols.append(cat_not_null_cols).append(num_null0_cols).append(num_not_null_cols)

Distribution and Outliers

# SalePrice (target)
sns.distplot(y_train, kde = True)
plt.title("Distribution of Sale Price")
plt.show()

The target looks skewed. We'll normalise it.

# normalized SalePrice (target)
y_train = np.log1p(y_train)
sns.distplot(y_train, fit = norm, kde = True)
plt.title("Distribution of Sale Price")
plt.show()

The target is normalized. Now, we can look at the numerical features.

plt.figure(figsize = (12,10))
fig = sns.boxplot(data = X_train[num_cols], orient = 'h')
fig.set_xscale("log")

There are quite a few outliers in the numeric columns. We'll need to scale numeric data using RobustScaler in the preprocessor building section. Also, many of the numeric features are skewed and will need to be normalized so that we can train ML models which assume normality in the features.

# features which deviate a lot from the normal curve
imputed_features = pd.DataFrame(num_null_ct.fit_transform(X_train[num_cols]))
num_features_in = num_null0_cols.append(num_not_null_cols)
imputed_features.columns = num_features_in

skew_features = imputed_features.apply(lambda x: skew(x)).sort_values(ascending = False)

high_skew = skew_features.loc[skew_features > 0.5]
skewed_index = high_skew.index
high_skew

MiscVal          24.451640
PoolArea         14.813135
LotArea          12.195142
3SsnPorch        10.293752
LowQualFinSF      9.002080
KitchenAbvGr      4.483784
BsmtFinSF2        4.250888
ScreenPorch       4.117977
BsmtHalfBath      4.099186
EnclosedPorch     3.086696
MasVnrArea        2.674865
OpenPorchSF       2.361912
BsmtFinSF1        1.683771
WoodDeckSF        1.539792
TotalBsmtSF       1.522688
1stFlrSF          1.375342
GrLivArea         1.365156
BsmtUnfSF         0.919323
2ndFlrSF          0.812194
OverallCond       0.692355
TotRmsAbvGrd      0.675646
HalfBath          0.675203
Fireplaces        0.648898
BsmtFullBath      0.595454
dtype: float64

# normalize the skewed features
norma = Normalizer()

normalized_cols = pd.DataFrame(norma.fit_transform(imputed_features[skewed_index]))
normalized_cols.columns = skewed_index
normalized_cols.head()

plt.figure(figsize = (12, 10))
fig = sns.boxplot(data = normalized_cols, orient = "h")
fig.set_xscale("log")

Some features like MasVnrArea, OpenPorchSF, BsmtFinSF1, WoodDeckDF, 2ndFlrSF couldn't be normalized. These columns will thus need to be dropped after extracting important information from them, which we'll do in the feature engineering section.

Feature Engineering

Area related features are important in predicting sale price of houses, and thus we can engineer some features related to area. We can also add some binary features indicating presence of swimming pool, garage or fireplace, which also are an important determiners of real estate value.

def add_cols(df):
    
    # Add area related features
    df['TotalSF'] = df['TotalBsmtSF'] + df['1stFlrSF'] + df['2ndFlrSF']
    df['Total_Bathrooms'] = (df['FullBath'] + (0.5 * df['HalfBath']) + df['BsmtFullBath'] + (0.5 * df['BsmtHalfBath']))
    df['Total_porch_sf'] = (df['OpenPorchSF'] + df['3SsnPorch'] +
                                  df['EnclosedPorch'] + df['ScreenPorch'] +
                                  df['WoodDeckSF'])
    
    # Add simplified categorical features
    df['haspool'] = df['PoolArea'].apply(lambda x: 1 if x > 0 else 0)
    df['hasgarage'] = df['GarageArea'].apply(lambda x: 1 if x > 0 else 0)
    df['hasbsmt'] = df['TotalBsmtSF'].apply(lambda x: 1 if x > 0 else 0)
    df['hasfireplace'] = df['Fireplaces'].apply(lambda x: 1 if x > 0 else 0)
    
    df[["haspool", "hasgarage", "hasbsmt", "hasfireplace"]] = df[["haspool", "hasgarage", "hasbsmt", "hasfireplace"]].astype("object")
    
    return df

Prepare the data

Now, we will combine all we have learnt and done in the previous sections to process the data on which an ML algorithm can be trained. We'll

normalize target
impute null values
add features
drop features
onehotencode categorical features
normalize skewed numeric features
scale numeric features

# function to preprocess the data
def get_prepared_data(transform_numeric = True):
    X_trn = train.copy()
    y_trn = X_trn.pop("SalePrice")
    X_tst = test.copy()
    
    X_trn['MSSubClass'] = X_trn['MSSubClass'].astype("object")
    X_tst['MSSubClass'] = X_tst['MSSubClass'].astype("object")

    # normalize target
    y_trn = np.log1p(y_trn)
    
    # handle null values
    X_trn, y_trn = drop_electrical_null(X_trn, y_trn)
    X_trn = pd.DataFrame(null_ct.fit_transform(X_trn))
    X_tst = pd.DataFrame(null_ct.transform(X_tst))
    
    X_trn = X_trn.infer_objects()
    X_tst = X_tst.infer_objects()
    
    # re add column names
    X_trn.columns = null_ct_features_in
    X_tst.columns = null_ct_features_in
    
    X_trn['MSSubClass'] = X_trn['MSSubClass'].astype("object")
    X_tst['MSSubClass'] = X_tst['MSSubClass'].astype("object")
    
    # add features
    X_trn = add_cols(X_trn)
    X_tst = add_cols(X_tst)

    # drop features
    X_trn.drop(columns = ["MasVnrArea", "OpenPorchSF", "BsmtFinSF1", "WoodDeckSF", "2ndFlrSF"], inplace = True)
    X_tst.drop(columns = ["MasVnrArea", "OpenPorchSF", "BsmtFinSF1", "WoodDeckSF", "2ndFlrSF"], inplace = True)
    
    # categorical features
    cat_cols = X_trn.select_dtypes(include = ["object"]).columns
    cat_ohe = OneHotEncoder(drop = 'first', handle_unknown = 'ignore', sparse = False, dtype = 'uint8')

    # normalize numeric features
    num_cols = X_trn.select_dtypes(exclude = ["object"]).columns
    num_pipe = make_pipeline(Normalizer(), RobustScaler())
    
    # column transformer
    if transform_numeric:
        ct = make_column_transformer((cat_ohe, cat_cols), (num_pipe, num_cols))
    else:
        ct = make_column_transformer((cat_ohe, cat_cols), remainder = "passthrough")
        
    X_trn = pd.DataFrame(ct.fit_transform(X_trn))
    X_tst = pd.DataFrame(ct.transform(X_tst))

    return X_trn, y_trn, X_tst

# get the processed data
X_trn, y_trn, X_tst = get_prepared_data(True)

Now, we can move on to ML model training.

Train a Dummy model and Evaluate Performance

We'll train a dummy classifier to establish a baseline score. The future models should at least beat this scores. It helps to identify errors in training.

Dummy Model

# define model
dummy_model = DummyRegressor()

# model evaluation
def evaluate_model(model, X_trn = X_trn):
    cvs = cross_val_score(model, X_trn, y_trn, scoring = "neg_root_mean_squared_error")

    rmsle = -cvs.mean()
    
    print(f"Model RMSLE: {rmsle:.5f}")
    
evaluate_model(dummy_model)

Model RMSLE: 0.39936

Make submission

# train model
dummy_model.fit(X_trn, y_trn)

# make predictions
dummy_preds = dummy_model.predict(X_tst)

-cross_val_score(dummy_model, X_trn, y_trn, scoring = "neg_root_mean_squared_error").mean()

0.3993601707158806

# create submission file
submission_df["SalePrice"] = dummy_preds
submission_df.to_csv("dummy_model.csv", index = None)

The submission gives use a score of 0.42578. The future ML models should at least beat the cv rmsle score of 0.39936 and submission rmsle score of 0.42578.

Linear Models

First we'll train some linear models and compare their performance and make decision on training a final one.

# cv splitter
k_folds = KFold(5, shuffle = True, random_state = seed)

# parameters for cv
e_alphas = np.arange(0.0001, 0.0007, 0.0001)
e_l1ratio = np.arange(0.8, 1, 0.05)
alphas_ridge = np.arange(10, 16, 0.5)
alphas_lasso = np.arange(0.0001, 0.0008, 0.0001)

# linear models
ridge = RidgeCV(alphas = alphas_ridge, scoring = "neg_mean_squared_error", cv = k_folds)
lasso = LassoCV(alphas = alphas_lasso, max_iter = 1e6, cv = k_folds, n_jobs = -1, random_state = seed)
elastic_net = ElasticNetCV(l1_ratio = e_l1ratio, alphas = e_alphas, max_iter = 1e6, cv = k_folds, n_jobs = -1, random_state = seed)

models = {"Ridge": ridge,
         "Lasso": lasso, 
         "ElasticNet": elastic_net}

# compare linear models
scores = {}

for model_name, model in models.items():
    
    print(f"{model_name}:")   
    score = np.sqrt(-cross_val_score(model, X_trn, y_trn, scoring = "neg_mean_squared_error", cv = k_folds))
    print(score)
    print(f"RMSLE mean: {score.mean():.5f} \nRMSLE std: {score.std():.5f}")
    print("-" * 50)
    
    scores[model_name] = (score.mean(), score.std())

Ridge:
[0.13595902 0.15096778 0.13826833 0.13605681 0.12490943]
RMSLE mean: 0.13723 
RMSLE std: 0.00830
--------------------------------------------------
Lasso:
[0.13737072 0.15284838 0.13888969 0.13586582 0.12556263]
RMSLE mean: 0.13811 
RMSLE std: 0.00873
--------------------------------------------------
ElasticNet:
[0.13721347 0.15238829 0.13744699 0.13594653 0.12550764]
RMSLE mean: 0.13770 
RMSLE std: 0.00858
--------------------------------------------------

All these scores are improvements over baseline score and achieve rmsle scores in the range of 0.12-0.15. We can get a better score by blending all these linear models. Blending makes theses models complement each other and reduce their individual overfits.

Blended Model

%%time

print("training started...")

# train all models
ridge.fit(X_trn, y_trn)
lasso.fit(X_trn, y_trn)
elastic_net.fit(X_trn, y_trn)

print("training complete")

training started...
training complete
CPU times: total: 8.7 s
Wall time: 2.62 s

Make Submission

# make predictions
blended_preds = (ridge.predict(X_tst) + lasso.predict(X_tst) + elastic_net.predict(X_tst))/3
blended_preds = np.expm1(blended_preds)

# create submission file
submission_df["SalePrice"] = blended_preds
submission_df.to_csv("blended_linear.csv", index = None)

This submission from blended linear models give us a rmsle score of 0.13819 which is similar to the performance of a single lasso model. But we can improve more on this score by using gradient boosting trees, which we'll do in the next section.

Gradient Boosting Models

In this section, first we'll train a lightgbm model and then we'll train an xgboost model. Normalizing and scaling that we applied on numeric features earlier deteriorates the performance of gradient boosting trees, which can actually use the information lost through transformation. Therefore, To train these models, we'll reload the processed data, this time without transforming the numeric features.

# load processed data
X_trn, y_trn, X_tst = get_prepared_data(False)

LightGBM - Train and Evaluate

# get cpu core count
core_count = psutil.cpu_count(logical = False)
core_count

4

# lightgbm parameters
param = {"bagging_fraction": 0.8,
        "bagging_freq": 2, 
        "learning_rate": 0.01,
        "num_leaves": 10,
        "max_depth": 5,
        "min_data_in_leaf": 10,
        "metric": "rmse",
        "num_threads": core_count,
        "verbosity": -1}

# train and evaluate lightgbm
val_scores = []
i = 1

for trn_idx, val_idx in k_folds.split(X_trn, y_trn):

    print(f"Split {i}:")
    
    trn = lgb.Dataset(X_trn.iloc[trn_idx], y_trn.iloc[trn_idx])
    val = lgb.Dataset(X_trn.iloc[val_idx], y_trn.iloc[val_idx])
    
    bst = lgb.train(param, trn, num_boost_round = 3000, valid_sets = [trn, val], early_stopping_rounds = 10, verbose_eval = 50)
    
    score = bst.best_score["valid_1"]["rmse"]
    val_scores.append(score)
    
    print(f"RMSLE: {score:.5f}")
    print("-" * 65)
    
    i += 1

Split 1:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.2779	valid_1's rmse: 0.294501
[100]	training's rmse: 0.208419	valid_1's rmse: 0.22513
[150]	training's rmse: 0.168045	valid_1's rmse: 0.186805
[200]	training's rmse: 0.143163	valid_1's rmse: 0.164704
[250]	training's rmse: 0.126932	valid_1's rmse: 0.151274
[300]	training's rmse: 0.115682	valid_1's rmse: 0.14419
[350]	training's rmse: 0.10737	valid_1's rmse: 0.139166
[400]	training's rmse: 0.101333	valid_1's rmse: 0.135577
[450]	training's rmse: 0.0965005	valid_1's rmse: 0.133299
[500]	training's rmse: 0.0926217	valid_1's rmse: 0.131744
[550]	training's rmse: 0.0891204	valid_1's rmse: 0.130774
[600]	training's rmse: 0.0863924	valid_1's rmse: 0.12962
[650]	training's rmse: 0.0838749	valid_1's rmse: 0.129178
Early stopping, best iteration is:
[644]	training's rmse: 0.0840888	valid_1's rmse: 0.129117
RMSLE: 0.12912
-----------------------------------------------------------------
Split 2:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.273782	valid_1's rmse: 0.313671
[100]	training's rmse: 0.205466	valid_1's rmse: 0.244458
[150]	training's rmse: 0.165209	valid_1's rmse: 0.204575
[200]	training's rmse: 0.140476	valid_1's rmse: 0.181242
[250]	training's rmse: 0.124644	valid_1's rmse: 0.166237
[300]	training's rmse: 0.113833	valid_1's rmse: 0.157775
[350]	training's rmse: 0.105639	valid_1's rmse: 0.151577
[400]	training's rmse: 0.0995413	valid_1's rmse: 0.148238
[450]	training's rmse: 0.0946096	valid_1's rmse: 0.145141
Early stopping, best iteration is:
[484]	training's rmse: 0.0917289	valid_1's rmse: 0.144245
RMSLE: 0.14425
-----------------------------------------------------------------
Split 3:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.27963	valid_1's rmse: 0.281509
[100]	training's rmse: 0.208148	valid_1's rmse: 0.220384
[150]	training's rmse: 0.16596	valid_1's rmse: 0.188637
[200]	training's rmse: 0.139878	valid_1's rmse: 0.171665
[250]	training's rmse: 0.123095	valid_1's rmse: 0.162633
[300]	training's rmse: 0.11184	valid_1's rmse: 0.158664
[350]	training's rmse: 0.103994	valid_1's rmse: 0.156169
[400]	training's rmse: 0.0980702	valid_1's rmse: 0.154727
[450]	training's rmse: 0.0935271	valid_1's rmse: 0.153839
Early stopping, best iteration is:
[466]	training's rmse: 0.0922914	valid_1's rmse: 0.15353
RMSLE: 0.15353
-----------------------------------------------------------------
Split 4:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.279398	valid_1's rmse: 0.284018
[100]	training's rmse: 0.208896	valid_1's rmse: 0.220487
[150]	training's rmse: 0.16791	valid_1's rmse: 0.185005
[200]	training's rmse: 0.14287	valid_1's rmse: 0.165001
[250]	training's rmse: 0.126724	valid_1's rmse: 0.153032
[300]	training's rmse: 0.115538	valid_1's rmse: 0.145386
[350]	training's rmse: 0.107436	valid_1's rmse: 0.140329
[400]	training's rmse: 0.10126	valid_1's rmse: 0.136382
[450]	training's rmse: 0.096186	valid_1's rmse: 0.133607
[500]	training's rmse: 0.0921356	valid_1's rmse: 0.131694
[550]	training's rmse: 0.0886764	valid_1's rmse: 0.12993
[600]	training's rmse: 0.0858351	valid_1's rmse: 0.129052
Early stopping, best iteration is:
[593]	training's rmse: 0.0862509	valid_1's rmse: 0.129002
RMSLE: 0.12900
-----------------------------------------------------------------
Split 5:
Training until validation scores don't improve for 10 rounds
[50]	training's rmse: 0.286016	valid_1's rmse: 0.2487
[100]	training's rmse: 0.21402	valid_1's rmse: 0.188862
[150]	training's rmse: 0.171467	valid_1's rmse: 0.156549
[200]	training's rmse: 0.145601	valid_1's rmse: 0.139452
[250]	training's rmse: 0.129239	valid_1's rmse: 0.130076
[300]	training's rmse: 0.117925	valid_1's rmse: 0.124245
[350]	training's rmse: 0.109824	valid_1's rmse: 0.120353
[400]	training's rmse: 0.103684	valid_1's rmse: 0.118195
[450]	training's rmse: 0.0986858	valid_1's rmse: 0.116684
[500]	training's rmse: 0.0947134	valid_1's rmse: 0.115513
[550]	training's rmse: 0.0912204	valid_1's rmse: 0.114181
[600]	training's rmse: 0.0884721	valid_1's rmse: 0.113515
Early stopping, best iteration is:
[632]	training's rmse: 0.0869814	valid_1's rmse: 0.113224
RMSLE: 0.11322
-----------------------------------------------------------------

# Avg RMSLE
np.mean(val_scores)

0.13382349293051415

Even without hyperparameter tuning, the validation scores are better than those of linear models. Now, we'll train on the whole dataset and make a submission.

trn = lgb.Dataset(X_trn, y_trn)

lgb_cv = lgb.cv(param, trn, num_boost_round = 3000, folds = k_folds, early_stopping_rounds = 10)

lgb_cv["rmse-mean"][-1]

0.13197370164429242

# train on full data
bst = lgb.train(param, trn, num_boost_round = len(lgb_cv["rmse-mean"]))

# make predictions
lgb_preds = np.expm1(bst.predict(X_tst))

Make submission

submission_df["SalePrice"] = lgb_preds

submission_df.to_csv("lgb.csv", index = None)

This submission gives us a score of 0.12901, which is an improvement over the last submission. Now, we can further optimize it with hyperparameter tuning.

LightGBM - Hyperparameter tuning

We'll use Bayesian Optimization to tune the hyperparameters in this section and then we'll make a submssion.

# black box function for Bayesian Optimization
def LGB_bayesian(bagging_fraction,
                bagging_freq,
                lambda_l1,
                lambda_l2,
                learning_rate,
                max_depth,
                min_data_in_leaf,
                min_gain_to_split,
                min_sum_hessian_in_leaf,  
                num_leaves,
                feature_fraction):
    
    # LightGBM expects these parameters to be integer. So we make them integer
    bagging_freq = int(bagging_freq)
    num_leaves = int(num_leaves)
    min_data_in_leaf = int(min_data_in_leaf)
    max_depth = int(max_depth)
    
    # parameters
    param = {'bagging_fraction': bagging_fraction,
            'bagging_freq': bagging_freq,
            'lambda_l1': lambda_l1,
            'lambda_l2': lambda_l2,
            'learning_rate': learning_rate,
            'max_depth': max_depth,
            'min_data_in_leaf': min_data_in_leaf,
            'min_gain_to_split': min_gain_to_split,
            'min_sum_hessian_in_leaf': min_sum_hessian_in_leaf,
            'num_leaves': num_leaves,
            'feature_fraction': feature_fraction,
            'seed': seed,
            'feature_fraction_seed': seed,
            'bagging_seed': seed,
            'drop_seed': seed,
            'boosting_type': 'gbdt',
            'metric': 'rmse',
            'verbosity': -1,
            'num_threads': core_count}
    
    trn = lgb.Dataset(X_trn, y_trn)
    
    lgb_cv = lgb.cv(param, trn, num_boost_round = 1500, folds = k_folds, stratified = False, early_stopping_rounds = 10, seed = seed)
    score = lgb_cv["rmse-mean"][-1]
    
    return 1/score

# parameter bounds
bounds_LGB = {
    'bagging_fraction': (0.5, 1),
    'bagging_freq': (1, 4),
    'lambda_l1': (0, 3.0), 
    'lambda_l2': (0, 3.0), 
    'learning_rate': (0.005, 0.3),
    'max_depth':(3,8),
    'min_data_in_leaf': (5, 20),  
    'min_gain_to_split': (0, 1),
    'min_sum_hessian_in_leaf': (0.01, 20),    
    'num_leaves': (5, 20),
    'feature_fraction': (0.05, 1)
}

# optimizer
LG_BO = BayesianOptimization(LGB_bayesian, bounds_LGB, random_state = seed)

# find the best hyperparameters
LG_BO.maximize(init_points = 10, n_iter = 200)

|   iter    |  target   | baggin... | baggin... | featur... | lambda_l1 | lambda_l2 | learni... | max_depth | min_da... | min_ga... | min_su... | num_le... |
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|  1        |  6.338    |  0.6873   |  3.852    |  0.7454   |  1.796    |  0.4681   |  0.05102  |  3.29     |  17.99    |  0.6011   |  14.16    |  5.309    |
|  2        |  6.913    |  0.985    |  3.497    |  0.2517   |  0.5455   |  0.5502   |  0.09475  |  5.624    |  11.48    |  0.2912   |  12.24    |  7.092    |
|  3        |  6.262    |  0.6461   |  2.099    |  0.4833   |  2.356    |  0.599    |  0.1567   |  5.962    |  5.697    |  0.6075   |  3.419    |  5.976    |
|  4        |  6.481    |  0.9744   |  3.897    |  0.818    |  0.9138   |  0.293    |  0.2068   |  5.201    |  6.831    |  0.4952   |  0.6974   |  18.64    |
|  5        |  6.091    |  0.6294   |  2.988    |  0.3461   |  1.56     |  1.64     |  0.05953  |  7.848    |  16.63    |  0.9395   |  17.9     |  13.97    |
|  6        |  7.011    |  0.9609   |  1.265    |  0.2362   |  0.1357   |  0.976    |  0.1197   |  4.357    |  17.43    |  0.3568   |  5.626    |  13.14    |
|  7        |  5.948    |  0.5705   |  3.407    |  0.1208   |  2.961    |  2.317    |  0.06362  |  3.028    |  17.23    |  0.7069   |  14.58    |  16.57    |
|  8        |  6.32     |  0.537    |  2.075    |  0.1601   |  2.589    |  1.87     |  0.1026   |  3.318    |  9.665    |  0.3252   |  14.59    |  14.56    |
|  9        |  6.456    |  0.9436   |  2.417    |  0.1636   |  2.14     |  2.282    |  0.1706   |  6.855    |  12.41    |  0.5227   |  8.557    |  5.381    |
|  10       |  6.361    |  0.5539   |  1.094    |  0.6546   |  0.9431   |  1.526    |  0.2727   |  4.246    |  11.16    |  0.7556   |  4.584    |  6.155    |
|  11       |  7.721    |  1.0      |  2.125    |  0.1412   |  0.0      |  0.0      |  0.06735  |  5.043    |  15.68    |  0.0      |  7.024    |  11.51    |
|  12       |  7.152    |  1.0      |  3.976    |  0.05     |  0.0      |  0.0      |  0.005    |  6.074    |  15.12    |  0.0      |  7.333    |  11.65    |
|  13       |  7.117    |  1.0      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  4.35     |  15.28    |  0.0      |  8.484    |  10.99    |
|  14       |  7.192    |  1.0      |  1.735    |  0.3443   |  0.0      |  0.0      |  0.3      |  5.697    |  15.98    |  0.0      |  5.747    |  10.45    |
|  15       |  7.484    |  1.0      |  2.551    |  0.1004   |  0.0      |  0.0      |  0.005    |  3.799    |  14.85    |  0.0      |  6.269    |  12.16    |
|  16       |  7.229    |  1.0      |  1.369    |  0.05     |  0.0      |  0.0      |  0.005    |  5.841    |  14.82    |  0.0      |  6.828    |  13.1     |
|  17       |  7.598    |  0.5      |  3.165    |  1.0      |  0.0      |  0.0      |  0.005    |  3.953    |  16.9     |  0.0      |  7.403    |  11.63    |
|  18       |  6.571    |  1.0      |  2.755    |  0.05     |  2.317    |  0.0      |  0.005    |  4.365    |  16.39    |  0.0      |  7.109    |  11.7     |
|  19       |  7.101    |  0.5      |  2.554    |  1.0      |  0.0      |  1.657    |  0.3      |  4.573    |  15.56    |  0.0      |  7.148    |  11.65    |
|  20       |  7.5      |  1.0      |  2.199    |  1.0      |  0.0      |  0.0      |  0.005    |  5.806    |  17.47    |  0.0      |  7.955    |  12.02    |
|  21       |  7.663    |  0.5      |  3.658    |  1.0      |  0.0      |  0.0      |  0.005    |  4.533    |  19.86    |  0.0      |  7.692    |  10.71    |
|  22       |  6.163    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.511    |  20.0     |  1.0      |  8.371    |  12.92    |
|  23       |  7.701    |  0.5      |  3.072    |  1.0      |  0.0      |  0.0      |  0.005    |  5.102    |  18.32    |  0.0      |  7.531    |  9.631    |
|  24       |  7.628    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  6.035    |  20.0     |  0.0      |  6.098    |  9.744    |
|  25       |  7.427    |  1.0      |  3.651    |  1.0      |  0.0      |  0.0      |  0.005    |  3.433    |  20.0     |  0.0      |  6.478    |  8.486    |
|  26       |  7.494    |  1.0      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  6.458    |  20.0     |  0.0      |  8.892    |  8.96     |
|  27       |  7.722    |  0.5      |  1.094    |  1.0      |  0.0      |  0.0      |  0.005    |  5.619    |  20.0     |  0.0      |  7.365    |  9.731    |
|  28       |  7.016    |  0.5      |  2.615    |  0.05     |  0.0      |  2.514    |  0.005    |  5.673    |  20.0     |  0.0      |  7.401    |  9.38     |
|  29       |  7.696    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  6.844    |  6.929    |
|  30       |  7.641    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  3.315    |  5.912    |
|  31       |  6.112    |  0.5      |  3.916    |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  1.0      |  5.31     |  5.0      |
|  32       |  7.713    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  4.486    |  9.099    |
|  33       |  7.736    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.4313   |  8.136    |
|  34       |  7.696    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  4.89     |  20.0     |  0.0      |  2.015    |  7.902    |
|  35       |  6.966    |  0.5      |  1.0      |  1.0      |  3.0      |  0.0      |  0.005    |  7.259    |  20.0     |  0.0      |  1.833    |  7.665    |
|  36       |  7.658    |  0.5      |  1.0      |  1.0      |  0.0      |  2.869    |  0.005    |  6.494    |  20.0     |  0.0      |  0.01     |  6.427    |
|  37       |  7.51     |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.559    |  20.0     |  0.0      |  0.01     |  5.0      |
|  38       |  7.705    |  0.5      |  1.0      |  1.0      |  0.0      |  2.56     |  0.005    |  6.076    |  20.0     |  0.0      |  0.01     |  10.63    |
|  39       |  7.714    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  12.97    |
|  40       |  7.618    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.622    |  20.0     |  0.0      |  0.01     |  11.08    |
|  41       |  7.543    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  13.77    |
|  42       |  7.713    |  0.5      |  1.0      |  1.0      |  0.0      |  0.647    |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  17.94    |
|  43       |  6.926    |  0.5      |  3.143    |  1.0      |  3.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  19.14    |
|  44       |  7.717    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  16.55    |  0.0      |  0.01     |  15.91    |
|  45       |  7.703    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  4.411    |  19.77    |  0.0      |  0.01     |  15.51    |
|  46       |  6.15     |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  6.572    |  18.47    |  1.0      |  0.01     |  16.1     |
|  47       |  7.585    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  13.47    |
|  48       |  7.685    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  15.03    |
|  49       |  7.52     |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  15.19    |  0.0      |  0.01     |  18.18    |
|  50       |  7.044    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  11.03    |
|  51       |  7.751    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  16.68    |  0.0      |  0.01     |  12.46    |
|  52       |  7.68     |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  9.493    |
|  53       |  7.652    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  2.526    |  12.02    |
|  54       |  7.379    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  4.384    |  15.97    |  0.0      |  0.01     |  15.01    |
|  55       |  6.92     |  0.5      |  1.0      |  1.0      |  3.0      |  3.0      |  0.005    |  8.0      |  19.11    |  0.0      |  0.01     |  12.79    |
|  56       |  7.122    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  3.681    |  20.0     |
|  57       |  6.481    |  0.9844   |  3.004    |  0.461    |  0.1131   |  0.5129   |  0.09865  |  7.624    |  17.4     |  0.9014   |  0.1111   |  9.671    |
|  58       |  7.112    |  1.0      |  1.0      |  1.0      |  3.0      |  0.0      |  0.3      |  8.0      |  5.0      |  0.0      |  20.0     |  5.0      |
|  59       |  7.555    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.979    |  20.0     |  0.0      |  0.01     |  8.798    |
|  60       |  7.585    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  18.13    |
|  61       |  6.396    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  5.0      |  1.0      |  10.39    |  20.0     |
|  62       |  7.206    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.3      |  3.0      |  20.0     |  0.0      |  2.043    |  5.0      |
|  63       |  7.122    |  0.5      |  1.0      |  0.05     |  0.0      |  3.0      |  0.3      |  3.0      |  15.63    |  0.0      |  0.01     |  20.0     |
|  64       |  7.759    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.28    |  0.0      |  0.01     |  14.36    |
|  65       |  6.188    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.3      |  8.0      |  12.68    |  0.0      |  0.01     |  16.17    |
|  66       |  7.683    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  16.53    |  0.0      |  1.6      |  15.2     |
|  67       |  6.994    |  1.0      |  1.0      |  1.0      |  3.0      |  0.0      |  0.3      |  3.0      |  20.0     |  0.0      |  0.01     |  18.09    |
|  68       |  7.713    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  9.356    |  9.508    |
|  69       |  6.969    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.3      |  5.246    |  20.0     |  0.0      |  1.954    |  15.77    |
|  70       |  6.964    |  0.5      |  1.0      |  1.0      |  3.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  8.347    |  8.476    |
|  71       |  7.497    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.7     |  0.0      |  0.01     |  11.39    |
|  72       |  7.27     |  0.7003   |  2.9      |  0.9973   |  0.1094   |  2.975    |  0.09622  |  3.103    |  5.03     |  0.1283   |  19.92    |  5.92     |
|  73       |  6.659    |  0.5      |  4.0      |  0.05     |  0.0      |  0.0      |  0.3      |  3.0      |  5.0      |  0.0      |  14.83    |  5.0      |
|  74       |  7.147    |  0.5      |  1.0      |  1.0      |  2.124    |  3.0      |  0.005    |  8.0      |  14.89    |  0.0      |  0.01     |  14.45    |
|  75       |  7.766    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.1     |  0.0      |  2.427    |  13.15    |
|  76       |  7.597    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.19     |  20.0     |  0.0      |  0.01     |  11.81    |
|  77       |  7.691    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  2.412    |  8.24     |
|  78       |  5.994    |  1.0      |  4.0      |  1.0      |  3.0      |  3.0      |  0.3      |  8.0      |  5.0      |  1.0      |  20.0     |  20.0     |
|  79       |  6.312    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.3      |  8.0      |  11.0     |  1.0      |  20.0     |  5.0      |
|  80       |  7.384    |  1.0      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  5.0      |  0.0      |  20.0     |  11.43    |
|  81       |  6.07     |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  5.0      |  1.0      |  20.0     |  10.16    |
|  82       |  7.696    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  11.45    |  6.089    |
|  83       |  6.986    |  1.0      |  1.0      |  1.0      |  3.0      |  0.0      |  0.005    |  3.0      |  5.0      |  0.0      |  20.0     |  9.878    |
|  84       |  6.952    |  0.5      |  1.0      |  0.05     |  3.0      |  0.0      |  0.3      |  8.0      |  20.0     |  0.0      |  20.0     |  5.0      |
|  85       |  6.961    |  1.0      |  1.0      |  0.05     |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  13.09    |  5.0      |
|  86       |  7.093    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.3      |  8.0      |  20.0     |  0.0      |  13.16    |  9.104    |
|  87       |  7.46     |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  5.0      |  0.0      |  0.01     |  11.73    |
|  88       |  7.469    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  5.0      |  0.0      |  0.01     |  12.26    |
|  89       |  6.317    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  5.0      |  1.0      |  0.01     |  13.96    |
|  90       |  6.38     |  0.5      |  4.0      |  0.05542  |  2.958    |  3.0      |  0.005    |  8.0      |  5.0      |  0.0      |  0.01     |  11.45    |
|  91       |  7.706    |  0.5      |  3.738    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.74    |  0.0      |  0.9225   |  13.42    |
|  92       |  7.185    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.3      |  5.567    |  20.0     |  0.0      |  9.453    |  6.733    |
|  93       |  7.481    |  1.0      |  2.642    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.85    |  0.0      |  2.701    |  13.31    |
|  94       |  7.577    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.931    |  11.69    |  0.0      |  0.01     |  12.14    |
|  95       |  6.869    |  0.5      |  4.0      |  0.05     |  0.0      |  3.0      |  0.005    |  3.0      |  5.0      |  0.0      |  6.951    |  12.17    |
|  96       |  6.988    |  1.0      |  4.0      |  1.0      |  3.0      |  3.0      |  0.3      |  3.0      |  10.07    |  0.0      |  0.01     |  10.4     |
|  97       |  7.409    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  5.0      |  0.0      |  0.01     |  9.278    |
|  98       |  6.397    |  0.5      |  1.611    |  1.0      |  0.0      |  3.0      |  0.005    |  5.695    |  12.98    |  1.0      |  1.15     |  13.16    |
|  99       |  7.153    |  0.5      |  2.703    |  1.0      |  0.0      |  1.932    |  0.3      |  6.105    |  20.0     |  0.0      |  2.283    |  9.836    |
|  100      |  7.564    |  1.0      |  1.0      |  1.0      |  0.0      |  1.399    |  0.005    |  8.0      |  18.0     |  0.0      |  0.01     |  14.38    |
|  101      |  7.541    |  1.0      |  1.0      |  1.0      |  0.0      |  2.267    |  0.005    |  8.0      |  17.94    |  0.0      |  0.01     |  17.56    |
|  102      |  7.714    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  3.316    |  12.91    |
|  103      |  7.532    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.19    |  0.0      |  3.933    |  16.08    |
|  104      |  7.661    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.81    |  0.0      |  0.01     |  11.59    |
|  105      |  7.555    |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  6.718    |  11.51    |
|  106      |  7.6      |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  6.529    |
|  107      |  6.931    |  0.5      |  4.0      |  1.0      |  3.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  0.01     |  5.0      |
|  108      |  7.155    |  1.0      |  1.0      |  0.05     |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  0.01     |  9.308    |
|  109      |  7.771    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  15.34    |  0.0      |  3.898    |  20.0     |
|  110      |  7.761    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.25    |  0.0      |  6.871    |  20.0     |
|  111      |  6.885    |  1.0      |  1.0      |  1.0      |  2.867    |  3.0      |  0.005    |  8.0      |  14.4     |  0.0      |  5.394    |  20.0     |
|  112      |  6.402    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  16.99    |  1.0      |  6.552    |  20.0     |
|  113      |  7.744    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.91    |  0.0      |  4.779    |  19.72    |
|  114      |  7.683    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  11.92    |  0.0      |  7.652    |  20.0     |
|  115      |  7.698    |  0.5      |  3.461    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.06    |  0.0      |  6.217    |  20.0     |
|  116      |  6.947    |  0.5      |  1.56     |  1.0      |  0.0      |  0.5025   |  0.3      |  8.0      |  13.04    |  0.0      |  6.418    |  20.0     |
|  117      |  7.743    |  0.5      |  1.839    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.97    |  0.0      |  6.914    |  17.71    |
|  118      |  7.721    |  0.5      |  1.572    |  1.0      |  0.0      |  3.0      |  0.005    |  5.583    |  13.05    |  0.0      |  6.423    |  19.37    |
|  119      |  7.707    |  0.5      |  3.264    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.21    |  0.0      |  2.684    |  20.0     |
|  120      |  7.446    |  1.0      |  3.063    |  1.0      |  0.0      |  3.0      |  0.005    |  6.612    |  13.0     |  0.0      |  9.332    |  19.46    |
|  121      |  7.679    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.69    |  0.0      |  4.214    |  17.12    |
|  122      |  7.637    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  4.938    |  13.18    |  0.0      |  4.092    |  20.0     |
|  123      |  6.861    |  0.5      |  4.0      |  0.05     |  0.0      |  3.0      |  0.005    |  3.0      |  11.14    |  0.0      |  6.848    |  20.0     |
|  124      |  7.72     |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.764    |  14.13    |  0.0      |  3.12     |  20.0     |
|  125      |  7.478    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.03    |  0.0      |  6.52     |  15.74    |
|  126      |  6.526    |  0.5      |  4.0      |  0.05     |  0.0      |  3.0      |  0.3      |  8.0      |  10.66    |  0.0      |  7.016    |  17.2     |
|  127      |  7.543    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  9.337    |  0.0      |  0.01     |  11.31    |
|  128      |  7.5      |  1.0      |  2.932    |  1.0      |  0.0      |  3.0      |  0.005    |  6.703    |  14.46    |  0.0      |  4.53     |  18.53    |
|  129      |  7.743    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.53    |  0.0      |  9.766    |  17.79    |
|  130      |  7.124    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  8.0      |  20.0     |  0.0      |  1.39     |  15.51    |
|  131      |  7.599    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  20.0     |  0.0      |  2.035    |  5.0      |
|  132      |  7.701    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  17.46    |  0.0      |  0.01     |  20.0     |
|  133      |  7.751    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.23    |  0.0      |  12.34    |  20.0     |
|  134      |  6.433    |  0.8563   |  1.084    |  0.8246   |  1.864    |  2.859    |  0.2361   |  7.893    |  11.11    |  0.5258   |  13.22    |  19.96    |
|  135      |  7.771    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  15.79    |  0.0      |  11.99    |  20.0     |
|  136      |  7.737    |  0.5      |  1.0      |  1.0      |  0.0      |  0.1906   |  0.005    |  8.0      |  14.74    |  0.0      |  12.11    |  20.0     |
|  137      |  7.155    |  1.0      |  1.0      |  1.0      |  0.0      |  1.618    |  0.3      |  8.0      |  15.76    |  0.0      |  14.82    |  20.0     |
|  138      |  7.69     |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  14.51    |  0.0      |  3.827    |  14.25    |
|  139      |  7.645    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  20.0     |  20.0     |
|  140      |  7.72     |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.964    |  14.37    |  0.0      |  10.68    |  20.0     |
|  141      |  6.144    |  0.703    |  3.009    |  0.9848   |  2.826    |  1.277    |  0.07818  |  4.065    |  19.83    |  0.578    |  19.77    |  19.91    |
|  142      |  7.613    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  11.17    |  0.0      |  1.945    |  20.0     |
|  143      |  7.005    |  0.5      |  1.0      |  0.05     |  0.0      |  3.0      |  0.005    |  8.0      |  14.04    |  0.0      |  10.12    |  20.0     |
|  144      |  7.735    |  0.5      |  1.0      |  1.0      |  0.0      |  2.229    |  0.005    |  6.85     |  14.55    |  0.0      |  12.14    |  18.01    |
|  145      |  7.53     |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.48    |  0.0      |  8.942    |  15.1     |
|  146      |  7.728    |  0.5      |  1.0      |  1.0      |  0.0      |  0.3714   |  0.005    |  4.877    |  14.62    |  0.0      |  11.85    |  20.0     |
|  147      |  7.726    |  0.5      |  1.0      |  1.0      |  0.0      |  0.7395   |  0.005    |  6.036    |  17.27    |  0.0      |  11.66    |  20.0     |
|  148      |  7.703    |  0.5      |  3.218    |  1.0      |  0.0      |  1.823    |  0.005    |  6.511    |  15.24    |  0.0      |  12.17    |  20.0     |
|  149      |  7.691    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.666    |  20.0     |  0.0      |  0.01     |  20.0     |
|  150      |  7.074    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.3      |  4.744    |  15.87    |  0.0      |  12.67    |  20.0     |
|  151      |  7.727    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.209    |  13.45    |  0.0      |  9.067    |  17.37    |
|  152      |  7.664    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.536    |  15.33    |  0.0      |  9.038    |  20.0     |
|  153      |  7.688    |  0.5      |  3.665    |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  17.47    |  0.0      |  11.57    |  20.0     |
|  154      |  6.824    |  0.8679   |  1.189    |  0.6865   |  1.241    |  0.2776   |  0.2391   |  7.836    |  17.99    |  0.3011   |  11.28    |  18.28    |
|  155      |  7.765    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.745    |  13.59    |  0.0      |  10.19    |  18.0     |
|  156      |  7.637    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  12.1     |  0.0      |  10.06    |  20.0     |
|  157      |  5.829    |  0.5      |  1.0      |  1.0      |  3.0      |  0.0      |  0.005    |  3.0      |  13.65    |  1.0      |  9.414    |  20.0     |
|  158      |  7.722    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  14.14    |  0.0      |  11.83    |  18.6     |
|  159      |  7.696    |  0.5      |  3.769    |  1.0      |  0.0      |  0.0      |  0.005    |  5.468    |  15.68    |  0.0      |  10.23    |  20.0     |
|  160      |  7.735    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  12.44    |  0.0      |  11.57    |  16.4     |
|  161      |  7.678    |  0.5      |  3.038    |  1.0      |  0.0      |  0.0      |  0.005    |  5.759    |  12.45    |  0.0      |  11.87    |  20.0     |
|  162      |  7.651    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  19.93    |  0.0      |  10.31    |  20.0     |
|  163      |  7.512    |  1.0      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  13.06    |  0.0      |  1.242    |  16.52    |
|  164      |  6.866    |  0.5      |  1.0      |  0.05     |  0.0      |  0.0      |  0.005    |  3.0      |  11.23    |  0.0      |  20.0     |  20.0     |
|  165      |  7.59     |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  4.937    |  20.0     |  0.0      |  11.45    |  20.0     |
|  166      |  7.716    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  9.52     |  0.0      |  10.1     |  13.67    |
|  167      |  7.762    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  7.836    |  0.0      |  9.655    |  11.97    |
|  168      |  7.466    |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  5.822    |  0.0      |  10.12    |  11.86    |
|  169      |  6.291    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.17    |  1.0      |  11.29    |  13.24    |
|  170      |  7.75     |  0.5      |  1.0      |  1.0      |  0.0      |  1.089    |  0.005    |  8.0      |  7.938    |  0.0      |  7.208    |  12.35    |
|  171      |  7.233    |  0.5      |  1.0      |  1.0      |  1.813    |  3.0      |  0.005    |  8.0      |  5.894    |  0.0      |  8.166    |  11.77    |
|  172      |  7.7      |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  5.798    |  8.182    |  0.0      |  8.413    |  14.1     |
|  173      |  7.713    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  10.43    |  0.0      |  7.339    |  13.99    |
|  174      |  7.71     |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  6.127    |  10.56    |  0.0      |  9.516    |  16.69    |
|  175      |  7.686    |  0.5      |  3.691    |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  8.28     |  0.0      |  8.72     |  12.18    |
|  176      |  7.045    |  0.5      |  1.0      |  1.0      |  2.637    |  0.0      |  0.005    |  8.0      |  8.554    |  0.0      |  8.412    |  13.73    |
|  177      |  7.633    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  8.996    |  0.0      |  4.923    |  11.66    |
|  178      |  7.69     |  0.5      |  3.174    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  8.15     |  0.0      |  7.805    |  10.37    |
|  179      |  7.589    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  18.62    |  0.0      |  8.369    |  20.0     |
|  180      |  7.134    |  0.5      |  2.777    |  1.0      |  0.0      |  0.0      |  0.3      |  3.0      |  17.39    |  0.0      |  11.15    |  20.0     |
|  181      |  7.701    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  8.916    |  0.0      |  8.803    |  10.23    |
|  182      |  7.734    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  6.595    |  0.0      |  10.12    |  8.767    |
|  183      |  5.831    |  0.5883   |  3.886    |  0.9875   |  2.478    |  2.413    |  0.1524   |  7.782    |  5.016    |  0.955    |  11.68    |  6.71     |
|  184      |  6.93     |  0.5      |  4.0      |  0.05     |  0.0      |  3.0      |  0.005    |  3.0      |  20.0     |  0.0      |  8.267    |  20.0     |
|  185      |  7.645    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  6.37     |  20.0     |
|  186      |  7.707    |  0.5      |  1.0      |  1.0      |  0.0      |  2.342    |  0.005    |  5.92     |  7.563    |  0.0      |  8.868    |  10.45    |
|  187      |  7.467    |  1.0      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  9.747    |  0.0      |  6.339    |  11.01    |
|  188      |  7.577    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  11.48    |  0.0      |  0.01     |  5.0      |
|  189      |  7.543    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  14.29    |  0.0      |  0.01     |  5.0      |
|  190      |  7.665    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  16.57    |  0.0      |  4.447    |  20.0     |
|  191      |  7.431    |  1.0      |  2.88     |  1.0      |  0.0      |  0.0      |  0.005    |  5.406    |  9.268    |  0.0      |  7.44     |  11.86    |
|  192      |  7.558    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  9.919    |  0.0      |  0.01     |  8.668    |
|  193      |  7.376    |  1.0      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  10.94    |  0.0      |  0.01     |  5.0      |
|  194      |  7.677    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  10.99    |  0.0      |  2.338    |  9.088    |
|  195      |  7.14     |  0.5      |  1.0      |  0.05     |  0.0      |  2.317    |  0.005    |  8.0      |  7.301    |  0.0      |  8.58     |  10.18    |
|  196      |  7.723    |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.91     |  5.767    |  0.0      |  11.35    |  10.89    |
|  197      |  6.866    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.3      |  4.858    |  7.589    |  0.0      |  10.84    |  11.05    |
|  198      |  7.574    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  3.0      |  12.83    |  0.0      |  0.01     |  8.644    |
|  199      |  7.548    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  16.21    |  0.0      |  5.393    |  20.0     |
|  200      |  7.725    |  0.5      |  1.398    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  9.514    |  0.0      |  0.01     |  8.145    |
|  201      |  7.081    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.3      |  8.0      |  20.0     |  0.0      |  0.01     |  5.348    |
|  202      |  7.579    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  8.526    |  0.0      |  0.01     |  7.379    |
|  203      |  7.704    |  0.5      |  4.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  10.12    |  0.0      |  7.341    |  9.8      |
|  204      |  7.414    |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  5.0      |  0.0      |  6.01     |  14.66    |
|  205      |  7.69     |  0.5      |  1.0      |  1.0      |  0.0      |  3.0      |  0.005    |  5.959    |  6.401    |  0.0      |  8.928    |  13.36    |
|  206      |  7.672    |  0.5      |  3.405    |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  5.613    |  0.0      |  10.28    |  12.21    |
|  207      |  7.677    |  0.5      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  8.0      |  7.738    |  0.0      |  0.01     |  10.18    |
|  208      |  5.809    |  0.5      |  1.0      |  1.0      |  3.0      |  3.0      |  0.005    |  3.0      |  20.0     |  1.0      |  20.0     |  8.587    |
|  209      |  7.649    |  0.5      |  4.0      |  1.0      |  0.0      |  3.0      |  0.005    |  8.0      |  12.52    |  0.0      |  0.01     |  8.392    |
|  210      |  7.427    |  1.0      |  1.0      |  1.0      |  0.0      |  0.0      |  0.005    |  3.0      |  20.0     |  0.0      |  2.523    |  20.0     |
=============================================================================================================================================================

# get the performance of best hyperparameters
tuned_lgbm_score = 1/LG_BO.max['target']
print(f"RMSLE of tuned lightgbm: {tuned_lgbm_score:.5f}")

RMSLE of tuned lightgbm: 0.12868

# best parameters
params = LG_BO.max["params"]

int_params = ["bagging_freq", "max_depth", "min_data_in_leaf", "num_leaves"]

for parameter in int_params:
    params[parameter] = int(params[parameter])
    
other_lgbm_params = {'seed': seed,
'feature_fraction_seed': seed,
'bagging_seed': seed,
'drop_seed': seed,
'boosting_type': 'gbdt',
'metric': 'rmse',
'verbosity': -1,
'num_threads': core_count}

params.update(other_lgbm_params)

params

{'bagging_fraction': 0.5,
 'bagging_freq': 1,
 'feature_fraction': 1.0,
 'lambda_l1': 0.0,
 'lambda_l2': 3.0,
 'learning_rate': 0.005,
 'max_depth': 8,
 'min_data_in_leaf': 15,
 'min_gain_to_split': 0.0,
 'min_sum_hessian_in_leaf': 3.8982726762658557,
 'num_leaves': 20,
 'seed': 42,
 'feature_fraction_seed': 42,
 'bagging_seed': 42,
 'drop_seed': 42,
 'boosting_type': 'gbdt',
 'metric': 'rmse',
 'verbosity': -1,
 'num_threads': 4}

Train and Make Submission

# get the num_boost_rounds
trn = lgb.Dataset(X_trn, y_trn)
lgb_cv = lgb.cv(params, trn, num_boost_round = 3000, folds = k_folds, early_stopping_rounds = 10)

num_boost_round = len(lgb_cv["rmse-mean"]) - 10
num_boost_round

1479

# train model
bst = lgb.train(params, trn, num_boost_round = num_boost_round)

# make predictions
lgb_preds = np.expm1(bst.predict(X_tst))

# create submission file
submission_df["SalePrice"] = lgb_preds
submission_df.to_csv("lgb_tuned.csv", index = None)

This submission gives us a score of 0.12715. The hyperparameter tuning helped to improve the performance of the lightgb model by a little bit. Now, we'll see how xgboost performs.

XGBoost - Train and Evaluate

# load the datasets into xgboost DMatrices
train_d = xgb.DMatrix(X_trn, y_trn)
test_d = xgb.DMatrix(X_tst)

# xgboost parameters
xgb_params = {"eta": 0.1, 
            "subsample": 0.7,
            "tree_method": "hist",
            "random_state": seed}

# train and evaluate xgboost
xgb_cv = xgb.cv(xgb_params, train_d, num_boost_round = 1500, nfold = 5, early_stopping_rounds = 10)

xgb_cv.tail()

Even without hyperparameter tuning, xgboost is giving us a rmsle validation score of 0.127240. We can improve this score by hyperparameter tuning. First, we'll make predictions and a submission.

Make Submission

# train model
xgb_bst = xgb.train(xgb_params, train_d, num_boost_round = len(xgb_cv))

# make predictions
xgb_preds = np.expm1(xgb_bst.predict(test_d))

# create submission file
submission_df["SalePrice"] = xgb_preds
submission_df.to_csv("xgb_non_tuned.csv", index = None)

This submission gives a score of 0.13190, which looks like the model overfit a little bit. We can tune the hyperparameters and improve the performance.

XGBoost - Hyperparameter tuning

# black box function for Bayesian Optimization
def xgb_bayesian(eta,
                gamma,
                subsample,
                colsample_bytree,
                colsample_bynode,
                colsample_bylevel,
                max_depth):
    
    # this parameter has to an integer
    max_depth = int(max_depth)
    
    # xgboost parameters
    params = {"eta": eta,
             "gamma": gamma,
             "subsample": subsample,
             "colsample_bytree": colsample_bytree,
             "colsample_bynode": colsample_bynode,
             "colsample_bylevel": colsample_bylevel,
             "max_depth": max_depth,
             "tree_method": "hist"}
    
    # train and score
    xgb_cv = xgb.cv(params, train_d, num_boost_round = 1500, nfold =  5, early_stopping_rounds = 10, seed = seed)
    score = xgb_cv.iloc[-10]["test-rmse-mean"]
    
    return 1/score

# parameter bounds
xgb_bounds = {"eta": (0.01, 0.05),
             "gamma": (0, 20),
             "subsample": (0.4, 1),
             "colsample_bytree": (0.5, 1),
             "colsample_bynode": (0.5, 1),
             "colsample_bylevel": (0.5, 1),
             "max_depth": (2, 7)}

# optimizer
xgb_bo = BayesianOptimization(xgb_bayesian, xgb_bounds, random_state = seed)

# find the best hyperparameters
xgb_bo.maximize(init_points = 3, n_iter = 60)

|   iter    |  target   | colsam... | colsam... | colsam... |    eta    |   gamma   | max_depth | subsample |
-------------------------------------------------------------------------------------------------------------
|  1        |  5.275    |  0.6873   |  0.9754   |  0.866    |  0.03395  |  3.12     |  2.78     |  0.4349   |
|  2        |  3.383    |  0.9331   |  0.8006   |  0.854    |  0.01082  |  19.4     |  6.162    |  0.5274   |
|  3        |  4.52     |  0.5909   |  0.5917   |  0.6521   |  0.03099  |  8.639    |  3.456    |  0.7671   |
|  4        |  7.739    |  0.5      |  1.0      |  0.5      |  0.05     |  0.0      |  7.0      |  1.0      |
|  5        |  5.792    |  1.0      |  0.5      |  1.0      |  0.01     |  2.192    |  7.0      |  1.0      |
|  6        |  7.648    |  0.5      |  1.0      |  0.5      |  0.05     |  0.0      |  5.578    |  0.4      |
|  7        |  7.691    |  0.5      |  1.0      |  0.5      |  0.05     |  0.0      |  2.0      |  1.0      |
|  8        |  7.76     |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  3.967    |  1.0      |
|  9        |  7.818    |  0.5      |  0.5      |  1.0      |  0.05     |  0.0      |  5.085    |  1.0      |
|  10       |  7.422    |  0.7925   |  0.8704   |  0.985    |  0.04765  |  0.008645 |  2.932    |  0.7795   |
|  11       |  7.732    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  3.967    |  1.0      |
|  12       |  7.775    |  1.0      |  1.0      |  0.5      |  0.01     |  0.0      |  5.477    |  1.0      |
|  13       |  7.671    |  1.0      |  0.5      |  0.5      |  0.05     |  0.0      |  7.0      |  0.4      |
|  14       |  7.869    |  0.5      |  0.5      |  0.5      |  0.05     |  0.0      |  6.08     |  1.0      |
|  15       |  7.797    |  0.5413   |  0.7305   |  0.9571   |  0.0127   |  0.01644  |  6.651    |  0.9833   |
|  16       |  7.615    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  2.0      |  1.0      |
|  17       |  7.672    |  1.0      |  0.5      |  0.5      |  0.05     |  0.0      |  5.607    |  1.0      |
|  18       |  4.074    |  0.5      |  0.5      |  1.0      |  0.05     |  14.46    |  2.0      |  1.0      |
|  19       |  4.196    |  1.0      |  1.0      |  0.5      |  0.01     |  11.69    |  7.0      |  1.0      |
|  20       |  7.66     |  0.8068   |  0.9216   |  0.5596   |  0.01745  |  0.04023  |  4.768    |  0.995    |
|  21       |  3.822    |  0.5      |  0.5      |  0.5      |  0.05     |  20.0     |  2.0      |  1.0      |
|  22       |  7.696    |  0.5      |  1.0      |  1.0      |  0.01     |  0.0      |  5.865    |  1.0      |
|  23       |  7.522    |  0.7704   |  0.6646   |  0.5423   |  0.02316  |  0.07314  |  2.729    |  0.4044   |
|  24       |  7.616    |  0.5      |  0.5      |  1.0      |  0.01     |  0.0      |  2.0      |  1.0      |
|  25       |  7.566    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  2.0      |  0.4      |
|  26       |  7.887    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  7.0      |  0.4      |
|  27       |  6.545    |  0.5      |  0.5      |  0.5      |  0.01     |  0.8323   |  2.0      |  1.0      |
|  28       |  4.306    |  0.5      |  1.0      |  0.5      |  0.01     |  6.463    |  7.0      |  0.4      |
|  29       |  7.789    |  0.5      |  1.0      |  1.0      |  0.01     |  0.0      |  7.0      |  0.4      |
|  30       |  7.773    |  0.5007   |  0.5555   |  0.7216   |  0.03146  |  0.08095  |  6.635    |  0.5173   |
|  31       |  7.842    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  4.195    |  0.4      |
|  32       |  7.491    |  0.9942   |  0.5431   |  0.9432   |  0.04836  |  0.03342  |  3.756    |  0.4115   |
|  33       |  7.886    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  5.175    |  1.0      |
|  34       |  7.679    |  1.0      |  1.0      |  0.5      |  0.01     |  0.0      |  3.824    |  0.4      |
|  35       |  7.808    |  0.5      |  1.0      |  0.5      |  0.01     |  0.0      |  7.0      |  0.4      |
|  36       |  7.789    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  4.712    |  0.4      |
|  37       |  7.835    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  7.0      |  1.0      |
|  38       |  7.833    |  0.5      |  0.5      |  1.0      |  0.01     |  0.0      |  6.063    |  1.0      |
|  39       |  7.886    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  5.654    |  1.0      |
|  40       |  7.762    |  0.5803   |  0.505    |  0.5571   |  0.02706  |  0.004659 |  6.536    |  0.8425   |
|  41       |  7.706    |  0.5      |  0.5      |  1.0      |  0.05     |  0.0      |  7.0      |  0.4      |
|  42       |  7.044    |  0.5109   |  0.5865   |  0.7321   |  0.03324  |  0.4898   |  6.953    |  0.8459   |
|  43       |  4.434    |  1.0      |  0.5      |  1.0      |  0.01     |  6.029    |  2.0      |  0.4      |
|  44       |  7.363    |  1.0      |  1.0      |  1.0      |  0.01     |  0.0      |  2.0      |  0.4      |
|  45       |  7.829    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  5.106    |  0.4      |
|  46       |  7.809    |  0.5      |  0.5      |  1.0      |  0.01     |  0.0      |  5.964    |  0.4      |
|  47       |  7.689    |  1.0      |  0.5      |  1.0      |  0.01     |  0.0      |  4.909    |  0.4      |
|  48       |  7.76     |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  3.017    |  1.0      |
|  49       |  7.843    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  4.755    |  1.0      |
|  50       |  7.734    |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  3.453    |  0.4      |
|  51       |  7.522    |  1.0      |  1.0      |  1.0      |  0.01     |  0.0      |  7.0      |  1.0      |
|  52       |  7.81     |  0.5      |  0.5      |  0.5      |  0.01     |  0.0      |  6.282    |  0.4      |
|  53       |  7.699    |  1.0      |  0.5      |  0.5      |  0.05     |  0.0      |  4.654    |  0.4      |
|  54       |  3.392    |  0.5034   |  0.7218   |  0.5388   |  0.04978  |  15.4     |  6.853    |  0.4056   |
|  55       |  6.381    |  1.0      |  0.5      |  0.5      |  0.01     |  1.022    |  4.746    |  1.0      |
|  56       |  7.643    |  1.0      |  1.0      |  1.0      |  0.01     |  0.0      |  6.238    |  0.4      |
|  57       |  4.014    |  0.6623   |  0.9548   |  0.6727   |  0.0302   |  11.18    |  2.002    |  0.5538   |
|  58       |  7.491    |  0.5      |  1.0      |  0.5      |  0.01     |  0.0      |  2.789    |  1.0      |
|  59       |  7.747    |  0.9694   |  0.8278   |  0.8129   |  0.02669  |  0.0109   |  5.277    |  0.6284   |
|  60       |  7.651    |  1.0      |  0.5      |  0.5      |  0.01     |  0.0      |  3.44     |  0.4      |
|  61       |  7.858    |  0.5      |  0.8667   |  0.8305   |  0.01     |  0.0      |  6.538    |  0.4      |
|  62       |  7.788    |  1.0      |  1.0      |  0.5      |  0.01     |  0.0      |  5.181    |  0.4      |
|  63       |  7.683    |  0.5313   |  0.5206   |  0.7006   |  0.03925  |  0.000322 |  5.92     |  0.7606   |
=============================================================================================================

# get the performance of best hyperparameters
tuned_xgb_score = 1/xgb_bo.max['target']
print(f"RMSLE of tuned xgboost: {tuned_xgb_score:.5f}")

RMSLE of tuned xgboost: 0.12679

xgb_bo.max

{'target': 7.787494840784668,
 'params': {'colsample_bylevel': 0.5,
  'colsample_bynode': 0.5,
  'colsample_bytree': 0.5,
  'eta': 0.01,
  'gamma': 0.0,
  'max_depth': 4.676653597241575,
  'subsample': 1.0}}

# parameters
xgb_tuned_params = {"eta": 0.01,
                    "gamma": 0,
                    "subsample": 1.0,
                    "colsample_bytree": 0.5,
                    "colsample_bynode": 0.5,
                    "colsample_bylevel": 0.5,
                    "max_depth": 4,
                    "tree_method": "hist"}

Train and Make Submission

# get the num_boost_round
xgb_cv = xgb.cv(xgb_tuned_params, train_d, num_boost_round = 1500, nfold = 5, early_stopping_rounds = 10)

num_boost_round = len(xgb_cv) - 10

xgb_cv.tail()

# train model
bst = xgb.train(xgb_tuned_params, train_d, num_boost_round = num_boost_round)

# make predictions
xgb_preds = np.expm1(bst.predict(test_d))

# create submission file
submission_df["SalePrice"] = xgb_preds
submission_df.to_csv("xgb_tuned.csv", index = None)

This submission gives us the best score yet of rmsle 0.12488. This is our final submission.

Summary and Conclusion

In this project, we worked on the Ames housing data provided as part of a competition on Kaggle. We tried to predict the Sale Price of houses in Ames from this dataset. Before we could train ML models, we evaluated the data and prepared it for ML algorithms. We also trained a dummy model to spot errors in training. In the ML model training part, we first trained some linear models. Then we combined these linear models to form blended predictions. This improved the overall performance. Then we trained two gradient boosting models. First we trained a lightgbm model, which improved on the performance by blended linear models. Hyperparameter tuning also helped to further increase the submission score. Then we trained an XGBoost model, for which we also tuned its hyperparameters. This gave us the best rmsle score of 0.12488.

	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	LotConfig	...	PoolArea	PoolQC	Fence	MiscFeature	MiscVal	MoSold	YrSold	SaleType	SaleCondition	SalePrice
Id
1	60	RL	65.0	8450	Pave	NaN	Reg	Lvl	AllPub	Inside	...	0	NaN	NaN	NaN	0	2	2008	WD	Normal	208500
2	20	RL	80.0	9600	Pave	NaN	Reg	Lvl	AllPub	FR2	...	0	NaN	NaN	NaN	0	5	2007	WD	Normal	181500
3	60	RL	68.0	11250	Pave	NaN	IR1	Lvl	AllPub	Inside	...	0	NaN	NaN	NaN	0	9	2008	WD	Normal	223500
4	70	RL	60.0	9550	Pave	NaN	IR1	Lvl	AllPub	Corner	...	0	NaN	NaN	NaN	0	2	2006	WD	Abnorml	140000
5	60	RL	84.0	14260	Pave	NaN	IR1	Lvl	AllPub	FR2	...	0	NaN	NaN	NaN	0	12	2008	WD	Normal	250000

	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	LotConfig	LandSlope	Neighborhood	Condition1	Condition2	BldgType	HouseStyle	OverallQual	OverallCond	YearBuilt	YearRemodAdd	RoofStyle	RoofMatl	Exterior1st	Exterior2nd	MasVnrType	MasVnrArea	ExterQual	ExterCond	Foundation	BsmtQual	BsmtCond	BsmtExposure	BsmtFinType1	BsmtFinSF1	BsmtFinType2	BsmtFinSF2	BsmtUnfSF	TotalBsmtSF	Heating	HeatingQC	CentralAir	Electrical	1stFlrSF	2ndFlrSF	LowQualFinSF	GrLivArea	BsmtFullBath	BsmtHalfBath	FullBath	HalfBath	BedroomAbvGr	KitchenAbvGr	KitchenQual	TotRmsAbvGrd	Functional	Fireplaces	FireplaceQu	GarageType	GarageYrBlt	GarageFinish	GarageCars	GarageArea	GarageQual	GarageCond	PavedDrive	WoodDeckSF	OpenPorchSF	EnclosedPorch	3SsnPorch	ScreenPorch	PoolArea	PoolQC	Fence	MiscFeature	MiscVal	MoSold	YrSold	SaleType	SaleCondition
Id
1	60	RL	65.0	8450	Pave	NaN	Reg	Lvl	AllPub	Inside	Gtl	CollgCr	Norm	Norm	1Fam	2Story	7	5	2003	2003	Gable	CompShg	VinylSd	VinylSd	BrkFace	196.0	Gd	TA	PConc	Gd	TA	No	GLQ	706	Unf	0	150	856	GasA	Ex	Y	SBrkr	856	854	0	1710	1	0	2	1	3	1	Gd	8	Typ	0	NaN	Attchd	2003.0	RFn	2	548	TA	TA	Y	0	61	0	0	0	0	NaN	NaN	NaN	0	2	2008	WD	Normal
2	20	RL	80.0	9600	Pave	NaN	Reg	Lvl	AllPub	FR2	Gtl	Veenker	Feedr	Norm	1Fam	1Story	6	8	1976	1976	Gable	CompShg	MetalSd	MetalSd	None	0.0	TA	TA	CBlock	Gd	TA	Gd	ALQ	978	Unf	0	284	1262	GasA	Ex	Y	SBrkr	1262	0	0	1262	0	1	2	0	3	1	TA	6	Typ	1	TA	Attchd	1976.0	RFn	2	460	TA	TA	Y	298	0	0	0	0	0	NaN	NaN	NaN	0	5	2007	WD	Normal
3	60	RL	68.0	11250	Pave	NaN	IR1	Lvl	AllPub	Inside	Gtl	CollgCr	Norm	Norm	1Fam	2Story	7	5	2001	2002	Gable	CompShg	VinylSd	VinylSd	BrkFace	162.0	Gd	TA	PConc	Gd	TA	Mn	GLQ	486	Unf	0	434	920	GasA	Ex	Y	SBrkr	920	866	0	1786	1	0	2	1	3	1	Gd	6	Typ	1	TA	Attchd	2001.0	RFn	2	608	TA	TA	Y	0	42	0	0	0	0	NaN	NaN	NaN	0	9	2008	WD	Normal
4	70	RL	60.0	9550	Pave	NaN	IR1	Lvl	AllPub	Corner	Gtl	Crawfor	Norm	Norm	1Fam	2Story	7	5	1915	1970	Gable	CompShg	Wd Sdng	Wd Shng	None	0.0	TA	TA	BrkTil	TA	Gd	No	ALQ	216	Unf	0	540	756	GasA	Gd	Y	SBrkr	961	756	0	1717	1	0	1	0	3	1	Gd	7	Typ	1	Gd	Detchd	1998.0	Unf	3	642	TA	TA	Y	0	35	272	0	0	0	NaN	NaN	NaN	0	2	2006	WD	Abnorml
5	60	RL	84.0	14260	Pave	NaN	IR1	Lvl	AllPub	FR2	Gtl	NoRidge	Norm	Norm	1Fam	2Story	8	5	2000	2000	Gable	CompShg	VinylSd	VinylSd	BrkFace	350.0	Gd	TA	PConc	Gd	TA	Av	GLQ	655	Unf	0	490	1145	GasA	Ex	Y	SBrkr	1145	1053	0	2198	1	0	2	1	4	1	Gd	9	Typ	1	TA	Attchd	2000.0	RFn	3	836	TA	TA	Y	192	84	0	0	0	0	NaN	NaN	NaN	0	12	2008	WD	Normal
6	50	RL	85.0	14115	Pave	NaN	IR1	Lvl	AllPub	Inside	Gtl	Mitchel	Norm	Norm	1Fam	1.5Fin	5	5	1993	1995	Gable	CompShg	VinylSd	VinylSd	None	0.0	TA	TA	Wood	Gd	TA	No	GLQ	732	Unf	0	64	796	GasA	Ex	Y	SBrkr	796	566	0	1362	1	0	1	1	1	1	TA	5	Typ	0	NaN	Attchd	1993.0	Unf	2	480	TA	TA	Y	40	30	0	320	0	0	NaN	MnPrv	Shed	700	10	2009	WD	Normal
7	20	RL	75.0	10084	Pave	NaN	Reg	Lvl	AllPub	Inside	Gtl	Somerst	Norm	Norm	1Fam	1Story	8	5	2004	2005	Gable	CompShg	VinylSd	VinylSd	Stone	186.0	Gd	TA	PConc	Ex	TA	Av	GLQ	1369	Unf	0	317	1686	GasA	Ex	Y	SBrkr	1694	0	0	1694	1	0	2	0	3	1	Gd	7	Typ	1	Gd	Attchd	2004.0	RFn	2	636	TA	TA	Y	255	57	0	0	0	0	NaN	NaN	NaN	0	8	2007	WD	Normal
8	60	RL	NaN	10382	Pave	NaN	IR1	Lvl	AllPub	Corner	Gtl	NWAmes	PosN	Norm	1Fam	2Story	7	6	1973	1973	Gable	CompShg	HdBoard	HdBoard	Stone	240.0	TA	TA	CBlock	Gd	TA	Mn	ALQ	859	BLQ	32	216	1107	GasA	Ex	Y	SBrkr	1107	983	0	2090	1	0	2	1	3	1	TA	7	Typ	2	TA	Attchd	1973.0	RFn	2	484	TA	TA	Y	235	204	228	0	0	0	NaN	NaN	Shed	350	11	2009	WD	Normal
9	50	RM	51.0	6120	Pave	NaN	Reg	Lvl	AllPub	Inside	Gtl	OldTown	Artery	Norm	1Fam	1.5Fin	7	5	1931	1950	Gable	CompShg	BrkFace	Wd Shng	None	0.0	TA	TA	BrkTil	TA	TA	No	Unf	0	Unf	0	952	952	GasA	Gd	Y	FuseF	1022	752	0	1774	0	0	2	0	2	2	TA	8	Min1	2	TA	Detchd	1931.0	Unf	2	468	Fa	TA	Y	90	0	205	0	0	0	NaN	NaN	NaN	0	4	2008	WD	Abnorml
10	190	RL	50.0	7420	Pave	NaN	Reg	Lvl	AllPub	Corner	Gtl	BrkSide	Artery	Artery	2fmCon	1.5Unf	5	6	1939	1950	Gable	CompShg	MetalSd	MetalSd	None	0.0	TA	TA	BrkTil	TA	TA	No	GLQ	851	Unf	0	140	991	GasA	Ex	Y	SBrkr	1077	0	0	1077	1	0	1	0	2	2	TA	5	Typ	2	TA	Attchd	1939.0	RFn	1	205	Gd	TA	Y	0	4	0	0	0	0	NaN	NaN	NaN	0	1	2008	WD	Normal
11	20	RL	70.0	11200	Pave	NaN	Reg	Lvl	AllPub	Inside	Gtl	Sawyer	Norm	Norm	1Fam	1Story	5	5	1965	1965	Hip	CompShg	HdBoard	HdBoard	None	0.0	TA	TA	CBlock	TA	TA	No	Rec	906	Unf	0	134	1040	GasA	Ex	Y	SBrkr	1040	0	0	1040	1	0	1	0	3	1	TA	5	Typ	0	NaN	Detchd	1965.0	Unf	1	384	TA	TA	Y	0	0	0	0	0	0	NaN	NaN	NaN	0	2	2008	WD	Normal
12	60	RL	85.0	11924	Pave	NaN	IR1	Lvl	AllPub	Inside	Gtl	NridgHt	Norm	Norm	1Fam	2Story	9	5	2005	2006	Hip	CompShg	WdShing	Wd Shng	Stone	286.0	Ex	TA	PConc	Ex	TA	No	GLQ	998	Unf	0	177	1175	GasA	Ex	Y	SBrkr	1182	1142	0	2324	1	0	3	0	4	1	Ex	11	Typ	2	Gd	BuiltIn	2005.0	Fin	3	736	TA	TA	Y	147	21	0	0	0	0	NaN	NaN	NaN	0	7	2006	New	Partial
13	20	RL	NaN	12968	Pave	NaN	IR2	Lvl	AllPub	Inside	Gtl	Sawyer	Norm	Norm	1Fam	1Story	5	6	1962	1962	Hip	CompShg	HdBoard	Plywood	None	0.0	TA	TA	CBlock	TA	TA	No	ALQ	737	Unf	0	175	912	GasA	TA	Y	SBrkr	912	0	0	912	1	0	1	0	2	1	TA	4	Typ	0	NaN	Detchd	1962.0	Unf	1	352	TA	TA	Y	140	0	0	0	176	0	NaN	NaN	NaN	0	9	2008	WD	Normal
14	20	RL	91.0	10652	Pave	NaN	IR1	Lvl	AllPub	Inside	Gtl	CollgCr	Norm	Norm	1Fam	1Story	7	5	2006	2007	Gable	CompShg	VinylSd	VinylSd	Stone	306.0	Gd	TA	PConc	Gd	TA	Av	Unf	0	Unf	0	1494	1494	GasA	Ex	Y	SBrkr	1494	0	0	1494	0	0	2	0	3	1	Gd	7	Typ	1	Gd	Attchd	2006.0	RFn	3	840	TA	TA	Y	160	33	0	0	0	0	NaN	NaN	NaN	0	8	2007	New	Partial
15	20	RL	NaN	10920	Pave	NaN	IR1	Lvl	AllPub	Corner	Gtl	NAmes	Norm	Norm	1Fam	1Story	6	5	1960	1960	Hip	CompShg	MetalSd	MetalSd	BrkFace	212.0	TA	TA	CBlock	TA	TA	No	BLQ	733	Unf	0	520	1253	GasA	TA	Y	SBrkr	1253	0	0	1253	1	0	1	1	2	1	TA	5	Typ	1	Fa	Attchd	1960.0	RFn	1	352	TA	TA	Y	0	213	176	0	0	0	NaN	GdWo	NaN	0	5	2008	WD	Normal

	LotArea	KitchenAbvGr	BsmtHalfBath	EnclosedPorch	MasVnrArea	OpenPorchSF	BsmtFinSF1	WoodDeckSF	TotalBsmtSF	1stFlrSF	GrLivArea	BsmtUnfSF	2ndFlrSF	OverallCond	TotRmsAbvGrd	HalfBath	Fireplaces	BsmtFullBath
0	0.962439	0.000114	0.000000	0.00000	0.022324	0.006948	0.080412	0.000000	0.097497	0.097497	0.194766	0.017085	0.097269	0.000569	0.000911	0.000114	0.000000	0.000114
1	0.969430	0.000101	0.000101	0.00000	0.000000	0.000000	0.098761	0.030093	0.127440	0.127440	0.127440	0.028679	0.000000	0.000808	0.000606	0.000000	0.000101	0.000000
2	0.976793	0.000087	0.000000	0.00000	0.014066	0.003647	0.042197	0.000000	0.079880	0.079880	0.155071	0.037683	0.075191	0.000434	0.000521	0.000087	0.000087	0.000087
3	0.971507	0.000102	0.000000	0.02767	0.000000	0.003560	0.021973	0.000000	0.076907	0.097761	0.174668	0.054933	0.076907	0.000509	0.000712	0.000000	0.000102	0.000102
4	0.977664	0.000069	0.000000	0.00000	0.023996	0.005759	0.044907	0.013163	0.078501	0.078501	0.150695	0.033594	0.072194	0.000343	0.000617	0.000069	0.000069	0.000069

	train-rmse-mean	train-rmse-std	test-rmse-mean	test-rmse-std
126	0.033865	0.001209	0.127286	0.018559
127	0.033507	0.001256	0.127247	0.018577
128	0.033214	0.001191	0.127263	0.018581
129	0.032872	0.001175	0.127245	0.018490
130	0.032607	0.001167	0.127240	0.018507

	train-rmse-mean	train-rmse-std	test-rmse-mean	test-rmse-std
1495	0.063411	0.002336	0.121000	0.020686
1496	0.063391	0.002330	0.120999	0.020687
1497	0.063374	0.002329	0.120994	0.020690
1498	0.063356	0.002330	0.120985	0.020681
1499	0.063333	0.002331	0.120981	0.020679

	null_values	null_percentage
PoolQC	1453	99.520548
MiscFeature	1406	96.301370
Alley	1369	93.767123
Fence	1179	80.753425
FireplaceQu	690	47.260274
LotFrontage	259	17.739726
GarageType	81	5.547945
GarageYrBlt	81	5.547945
GarageFinish	81	5.547945
GarageQual	81	5.547945
GarageCond	81	5.547945
BsmtExposure	38	2.602740
BsmtFinType2	38	2.602740
BsmtFinType1	37	2.534247
BsmtCond	37	2.534247
BsmtQual	37	2.534247
MasVnrArea	8	0.547945
MasVnrType	8	0.547945
Electrical	1	0.068493