Machine learning interpretability

Machine learning interpretability#

In modern day machine learning it is important to be able to explain how our models “think”. A simple accuracy score isn’t enough. This notebook explores the lesson on interpretability.

Machine learning interpretability is an increasingly important topic in artificial intelligence.

As machine learning models become more complex, understanding how they make predictions is becoming more difficult. This lack of transparency can lead to a lack of trust in the model. It can make it difficult to identify and correct errors. Interpretability is the ability to explain how a machine learning model arrived at a particular decision. It is essential to build trust and understanding in these powerful tools.

This notebook will explore the importance of interpretability and provide practical examples of how it can be achieved.

How To#

from sklearn.model_selection import train_test_split
import pandas as pd

df = pd.read_csv("data/housing.csv")
df.head()

	longitude	latitude	housing_median_age	total_rooms	total_bedrooms	population	households	median_income	median_house_value	ocean_proximity
0	-122.23	37.88	41.0	880.0	129.0	322.0	126.0	8.3252	452600.0	NEAR BAY
1	-122.22	37.86	21.0	7099.0	1106.0	2401.0	1138.0	8.3014	358500.0	NEAR BAY
2	-122.24	37.85	52.0	1467.0	190.0	496.0	177.0	7.2574	352100.0	NEAR BAY
3	-122.25	37.85	52.0	1274.0	235.0	558.0	219.0	5.6431	341300.0	NEAR BAY
4	-122.25	37.85	52.0	1627.0	280.0	565.0	259.0	3.8462	342200.0	NEAR BAY

df = df.dropna()

x_train, x_, y_train, y_ = train_test_split(df.drop(["longitude","latitude", "ocean_proximity", "median_house_value"], axis=1), 
                                                    df.median_house_value, test_size=.5, stratify=df.ocean_proximity)

x_val, x_test, y_val, y_test = train_test_split(x_, y_, test_size=.5)

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()

model.fit(x_train, y_train)

RandomForestRegressor()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

model.score(x_val, y_val)

0.6653737863987246

Influence of Variables#

import eli5

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[8], line 1
----> 1 import eli5

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/eli5/__init__.py:13
from .formatters import (
   format_as_html,
   format_html_styles,
   format_as_text,
   format_as_dict,
)
from .explain import explain_weights, explain_prediction
---> 13 from .sklearn import explain_weights_sklearn, explain_prediction_sklearn
from .transform import transform_feature_names
try:

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/eli5/sklearn/__init__.py:3
# -*- coding: utf-8 -*-
from __future__ import absolute_import
----> 3 from .explain_weights import (
   explain_weights_sklearn,
   explain_linear_classifier_weights,
   explain_linear_regressor_weights,
   explain_rf_feature_importance,
   explain_decision_tree,
)
from .explain_prediction import (
   explain_prediction_sklearn,
   explain_prediction_linear_classifier,
   explain_prediction_linear_regressor,
)
from .unhashing import (
   InvertableHashingVectorizer,
   FeatureUnhasher,
   invert_hashing_and_fit,
)

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/eli5/sklearn/explain_weights.py:78
from eli5.transform import transform_feature_names
from eli5._feature_importances import (
   get_feature_importances_filtered,
   get_feature_importance_explanation,
)
---> 78 from .permutation_importance import PermutationImportance
LINEAR_CAVEATS = """
Caveats:
1. Be careful with features which are not
   (...)
  classification result for most examples.
""".lstrip()
HASHING_CAVEATS = """
Feature names are restored from their hashes; this is not 100% precise
because collisions are possible. For known collisions possible feature names
   (...)
the result is positive.
""".lstrip()

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/eli5/sklearn/permutation_importance.py:7
import numpy as np
from sklearn.model_selection import check_cv
----> 7 from sklearn.utils.metaestimators import if_delegate_has_method
from sklearn.utils import check_array, check_random_state
from sklearn.base import (
   BaseEstimator,
   MetaEstimatorMixin,
   clone,
   is_classifier
)

ImportError: cannot import name 'if_delegate_has_method' from 'sklearn.utils.metaestimators' (/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/sklearn/utils/metaestimators.py)

eli5.explain_weights(model)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 eli5.explain_weights(model)

NameError: name 'eli5' is not defined

for x in range(5):
    display(eli5.explain_prediction(model, x_train.iloc[x, :]))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 2
      1 for x in range(5):
----> 2     display(eli5.explain_prediction(model, x_train.iloc[x, :]))

NameError: name 'eli5' is not defined

from sklearn.inspection import permutation_importance

permutation_importance(model, x_train, y_train)

{'importances_mean': array([0.31167605, 0.23759556, 0.41586708, 0.3610504 , 0.30871098,
        1.54918227]),
 'importances_std': array([0.00514937, 0.00433542, 0.00666387, 0.00283454, 0.0056444 ,
        0.02092214]),
 'importances': array([[0.31652346, 0.30323932, 0.310437  , 0.31760539, 0.31057506],
        [0.23951261, 0.24095205, 0.233456  , 0.24256299, 0.23149416],
        [0.4127844 , 0.4197916 , 0.41431098, 0.40636382, 0.4260846 ],
        [0.35828569, 0.36594585, 0.36184542, 0.36093884, 0.35823622],
        [0.31188079, 0.31009499, 0.30322236, 0.30153734, 0.31681944],
        [1.55416512, 1.53079852, 1.54025123, 1.53299162, 1.58770485]])}

from sklearn.inspection import plot_partial_dependence

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[13], line 1
----> 1 from sklearn.inspection import plot_partial_dependence

ImportError: cannot import name 'plot_partial_dependence' from 'sklearn.inspection' (/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/sklearn/inspection/__init__.py)

plot_partial_dependence(model, x_train, x_train.columns)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 plot_partial_dependence(model, x_train, x_train.columns)

NameError: name 'plot_partial_dependence' is not defined

Shap#

import shap

expl = shap.TreeExplainer(model)

shap.TreeExplainer(model, data=x_train)

<shap.explainers._tree.TreeExplainer at 0x7fcf57225af0>

shap_val = expl.shap_values(x_val)

shap.initjs()

shap.force_plot(expl.expected_value, shap_val[0, :], x_val.iloc[0, :])

Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.

Exercise#

Check out shap further and see which plots you can generate.