Decision trees and random forests

Decision trees and random forests#

Change up the machine learning models

Decision trees and random forests are popular machine learning techniques for classification and regression tasks.

A decision tree is a tree-like model where each node represents a decision based on a feature, and each branch represents an outcome of that decision. On the other hand, random forests are an ensemble of decision trees where each tree is trained on a subset of the data and a random subset of the features. They are powerful and widely used algorithms in machine learning because they can handle large datasets, deal with missing values, and provide interpretable results.

This notebook will explore decision trees and random forests in more detail and discuss their strengths and weaknesses.

How To#

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import pandas as pd

df = pd.read_csv("data/housing.csv")
df.head()

	longitude	latitude	housing_median_age	total_rooms	total_bedrooms	population	households	median_income	median_house_value	ocean_proximity
0	-122.23	37.88	41.0	880.0	129.0	322.0	126.0	8.3252	452600.0	NEAR BAY
1	-122.22	37.86	21.0	7099.0	1106.0	2401.0	1138.0	8.3014	358500.0	NEAR BAY
2	-122.24	37.85	52.0	1467.0	190.0	496.0	177.0	7.2574	352100.0	NEAR BAY
3	-122.25	37.85	52.0	1274.0	235.0	558.0	219.0	5.6431	341300.0	NEAR BAY
4	-122.25	37.85	52.0	1627.0	280.0	565.0	259.0	3.8462	342200.0	NEAR BAY

x_train, x_, y_train, y_ = train_test_split(df[["housing_median_age", "total_rooms", "median_income"]], 
                                                    df.median_house_value, test_size=.5)

x_val, x_test, y_val, y_test = train_test_split(x_, y_, test_size=.5)

Decision Trees#

from sklearn import preprocessing
from sklearn import tree

scaler = preprocessing.StandardScaler()
model = tree.DecisionTreeRegressor()

scaler.fit(x_train)

StandardScaler()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

model.fit(scaler.transform(x_train), y_train)

DecisionTreeRegressor()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

model.score(scaler.transform(x_val), y_val)

0.08402996853592337

Build a forest of decision trees#

from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor()

rf.fit(x_train, y_train)

RandomForestRegressor()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

rf.score(x_train, y_train)

0.9299377126605475

rf.score(x_val, y_val)

0.5034545833527466

rf.feature_importances_

array([0.14200784, 0.19935222, 0.65863994])

Exercise#

Experiment with different machine learning models

Decision trees and random forests

Contents

Decision trees and random forests#

How To#

Decision Trees#

Build a forest of decision trees#

Exercise#

Additional Resources#