Linear regression

Linear regression#

A simple machine learning model that can uncover relationships in data.

Linear regression is a robust machine learning algorithm that is commonly used for modelling and analyzing data.

It is a simple and effective technique for discovering relationships between variables and predicting future outcomes. The basic premise of linear regression is to find the best linear relationship between the independent and dependent variables in a dataset. Doing so can help identify patterns, trends, and correlations in the data, enabling us to make informed decisions and accurate predictions.

Linear regression is a versatile tool with applications in various fields, from finance and economics to healthcare and engineering.

How To#

import pandas as pd
df = pd.read_csv("data/housing.csv")
df.head()

	longitude	latitude	housing_median_age	total_rooms	total_bedrooms	population	households	median_income	median_house_value	ocean_proximity
0	-122.23	37.88	41.0	880.0	129.0	322.0	126.0	8.3252	452600.0	NEAR BAY
1	-122.22	37.86	21.0	7099.0	1106.0	2401.0	1138.0	8.3014	358500.0	NEAR BAY
2	-122.24	37.85	52.0	1467.0	190.0	496.0	177.0	7.2574	352100.0	NEAR BAY
3	-122.25	37.85	52.0	1274.0	235.0	558.0	219.0	5.6431	341300.0	NEAR BAY
4	-122.25	37.85	52.0	1627.0	280.0	565.0	259.0	3.8462	342200.0	NEAR BAY

Preparing training data#

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(df[["housing_median_age", "total_rooms", "median_income"]], 
                                                    df.median_house_value, test_size=.5,
                                                    stratify=df.ocean_proximity)

df.shape

(20640, 10)

x_train.shape

(10320, 3)

x_test.shape

(10320, 3)

Building the model#

model = LinearRegression()

model.fit(x_train, y_train)

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

model.score(x_test, y_test)

0.5150544602369341

Improving the model#

from sklearn import preprocessing

x_val, x_test, y_val, y_test = train_test_split(x_test, y_test)

x_test.shape

(2580, 3)

scaler = preprocessing.StandardScaler()
model = LinearRegression()

scaler.fit(x_train)

StandardScaler()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

x_scaled = scaler.transform(x_train)
x_scaled

array([[-0.61168405,  0.10966594, -0.34801344],
       [ 0.73900819, -0.42930933,  0.93166467],
       [ 0.1828408 , -1.08498446, -0.27156486],
       ...,
       [ 0.42119825, -0.1452928 ,  1.36494401],
       [-0.45277908,  1.66613277,  3.34677127],
       [-1.00894647,  0.32572472,  1.99345637]])

model.fit(x_scaled, y_train)

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

model.score(scaler.transform(x_val), y_val)

0.5113048620857856

scaler = preprocessing.MinMaxScaler().fit(x_train)
model = LinearRegression().fit(scaler.transform(x_train), y_train)
model.score(scaler.transform(x_val), y_val)

0.5113048620857856

Predicting with the Model#

model.predict(scaler.transform(x_test))

array([144942.9245965 , 203274.22981059, 358335.3990082 , ...,
       289425.51119122, 176933.16912778, 182636.22567211])

y_test

    77400.0
   185800.0
  374200.0
   91500.0
   162500.0
           ...   
   173900.0
   361100.0
  356100.0
   94200.0
  133000.0
Name: median_house_value, Length: 2580, dtype: float64

Inspecting the model#

model.coef_

array([105149.78222749, 153490.93791669, 618396.24168631])

model.intercept_

-3995.519358216203

Exercise#

Experiment how preprocessing can affect your data.

Linear regression

Contents

Linear regression#

How To#

Preparing training data#

Building the model#

Improving the model#

Predicting with the Model#

Inspecting the model#

Exercise#

Additional Resources#