Machine learning interpretability#

In modern day machine learning it is important to be able to explain how our models “think”. A simple accuracy score isn’t enough. This notebook explores the lesson on interpretability.

Machine learning interpretability is an increasingly important topic in artificial intelligence.

As machine learning models become more complex, understanding how they make predictions is becoming more difficult. This lack of transparency can lead to a lack of trust in the model. It can make it difficult to identify and correct errors. Interpretability is the ability to explain how a machine learning model arrived at a particular decision. It is essential to build trust and understanding in these powerful tools.

This notebook will explore the importance of interpretability and provide practical examples of how it can be achieved.

How To#

from sklearn.model_selection import train_test_split
import pandas as pd

df = pd.read_csv("data/housing.csv")
df.head()
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value ocean_proximity
0 -122.23 37.88 41.0 880.0 129.0 322.0 126.0 8.3252 452600.0 NEAR BAY
1 -122.22 37.86 21.0 7099.0 1106.0 2401.0 1138.0 8.3014 358500.0 NEAR BAY
2 -122.24 37.85 52.0 1467.0 190.0 496.0 177.0 7.2574 352100.0 NEAR BAY
3 -122.25 37.85 52.0 1274.0 235.0 558.0 219.0 5.6431 341300.0 NEAR BAY
4 -122.25 37.85 52.0 1627.0 280.0 565.0 259.0 3.8462 342200.0 NEAR BAY
df = df.dropna()
x_train, x_, y_train, y_ = train_test_split(df.drop(["longitude","latitude", "ocean_proximity", "median_house_value"], axis=1), 
                                                    df.median_house_value, test_size=.5, stratify=df.ocean_proximity)

x_val, x_test, y_val, y_test = train_test_split(x_, y_, test_size=.5)
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(x_train, y_train)
RandomForestRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
model.score(x_val, y_val)
0.6653737863987246

Influence of Variables#

import eli5
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[8], line 1
----> 1 import eli5

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/eli5/__init__.py:13
      6 from .formatters import (
      7     format_as_html,
      8     format_html_styles,
      9     format_as_text,
     10     format_as_dict,
     11 )
     12 from .explain import explain_weights, explain_prediction
---> 13 from .sklearn import explain_weights_sklearn, explain_prediction_sklearn
     14 from .transform import transform_feature_names
     17 try:

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/eli5/sklearn/__init__.py:3
      1 # -*- coding: utf-8 -*-
      2 from __future__ import absolute_import
----> 3 from .explain_weights import (
      4     explain_weights_sklearn,
      5     explain_linear_classifier_weights,
      6     explain_linear_regressor_weights,
      7     explain_rf_feature_importance,
      8     explain_decision_tree,
      9 )
     10 from .explain_prediction import (
     11     explain_prediction_sklearn,
     12     explain_prediction_linear_classifier,
     13     explain_prediction_linear_regressor,
     14 )
     15 from .unhashing import (
     16     InvertableHashingVectorizer,
     17     FeatureUnhasher,
     18     invert_hashing_and_fit,
     19 )

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/eli5/sklearn/explain_weights.py:78
     73 from eli5.transform import transform_feature_names
     74 from eli5._feature_importances import (
     75     get_feature_importances_filtered,
     76     get_feature_importance_explanation,
     77 )
---> 78 from .permutation_importance import PermutationImportance
     81 LINEAR_CAVEATS = """
     82 Caveats:
     83 1. Be careful with features which are not
   (...)
     90    classification result for most examples.
     91 """.lstrip()
     93 HASHING_CAVEATS = """
     94 Feature names are restored from their hashes; this is not 100% precise
     95 because collisions are possible. For known collisions possible feature names
   (...)
     99 the result is positive.
    100 """.lstrip()

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/eli5/sklearn/permutation_importance.py:7
      5 import numpy as np
      6 from sklearn.model_selection import check_cv
----> 7 from sklearn.utils.metaestimators import if_delegate_has_method
      8 from sklearn.utils import check_array, check_random_state
      9 from sklearn.base import (
     10     BaseEstimator,
     11     MetaEstimatorMixin,
     12     clone,
     13     is_classifier
     14 )

ImportError: cannot import name 'if_delegate_has_method' from 'sklearn.utils.metaestimators' (/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/sklearn/utils/metaestimators.py)
eli5.explain_weights(model)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 eli5.explain_weights(model)

NameError: name 'eli5' is not defined
for x in range(5):
    display(eli5.explain_prediction(model, x_train.iloc[x, :]))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 2
      1 for x in range(5):
----> 2     display(eli5.explain_prediction(model, x_train.iloc[x, :]))

NameError: name 'eli5' is not defined
from sklearn.inspection import permutation_importance
permutation_importance(model, x_train, y_train)
{'importances_mean': array([0.31167605, 0.23759556, 0.41586708, 0.3610504 , 0.30871098,
        1.54918227]),
 'importances_std': array([0.00514937, 0.00433542, 0.00666387, 0.00283454, 0.0056444 ,
        0.02092214]),
 'importances': array([[0.31652346, 0.30323932, 0.310437  , 0.31760539, 0.31057506],
        [0.23951261, 0.24095205, 0.233456  , 0.24256299, 0.23149416],
        [0.4127844 , 0.4197916 , 0.41431098, 0.40636382, 0.4260846 ],
        [0.35828569, 0.36594585, 0.36184542, 0.36093884, 0.35823622],
        [0.31188079, 0.31009499, 0.30322236, 0.30153734, 0.31681944],
        [1.55416512, 1.53079852, 1.54025123, 1.53299162, 1.58770485]])}
from sklearn.inspection import plot_partial_dependence
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[13], line 1
----> 1 from sklearn.inspection import plot_partial_dependence

ImportError: cannot import name 'plot_partial_dependence' from 'sklearn.inspection' (/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/sklearn/inspection/__init__.py)
plot_partial_dependence(model, x_train, x_train.columns)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 plot_partial_dependence(model, x_train, x_train.columns)

NameError: name 'plot_partial_dependence' is not defined

Shap#

import shap
expl = shap.TreeExplainer(model)
shap.TreeExplainer(model, data=x_train)
<shap.explainers._tree.TreeExplainer at 0x7fcf57225af0>
shap_val = expl.shap_values(x_val)
shap.initjs()
shap.force_plot(expl.expected_value, shap_val[0, :], x_val.iloc[0, :])
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.

Exercise#

Check out shap further and see which plots you can generate.

Additional Resources#