Interpretable Machine Learning

3.3 Scope of Interpretability

An algorithm trains a model, which produces the predictions. Each step can be evaluated in terms of transparency or interpretability.

3.3.1 Algorithm transparency

How does the algorithm create the model?

Algorithm transparency is about how the algorithm learns a model from the data and what kind of relationships it is capable of picking up. If you are using convolutional neural networks for classifying images, you can explain that the algorithm learns edge detectors and filters on the lowest layers. This is an understanding of how the algorithm works, but not of the specific model that is learned in the end and not about how single predictions are made. For this level of transparency, only knowledge about the algorithm and not about the data or concrete learned models are required. This book focuses on model interpretability and not algorithm transparency. Algorithms like the least squares method for linear models are well studied and understood. They score high in transparency. Deep learning approaches (pushing a gradient through a network with millions of weights) are less well understood and the inner workings are in the focus of on-going research. It is not clear how they exactly work, so they are less transparent.

3.3.2 Global, Holistic Model Interpretability

How does the trained model make predictions?

You could call a model interpretable if you can comprehend the whole model at once (Lipton 2016⁶). To explain the global model output, you need the trained model, knowledge about the algorithm and the data. This level of interpretability is about understanding how the model makes the decisions, based on a holistic view of its features and each of the learned components like weights, parameters, and structures. Which features are the important ones and what kind of interactions are happening? Global model interpretability helps to understand the distribution of your target variable based on the features. Arguably, global model interpretability is very hard to achieve in practice. Any model that exceeds a handful of parameters or weights, probably won’t fit in an average human’s short term memory. I’d argue that you cannot really imagine a linear model with 5 features and draw in your head the hyperplane that was estimated in the 5-dimensional feature space. Each feature space with more than 3 dimensions is just not imaginable for humans. Usually when people try to comprehend a model, they look at parts of it, like the weights in linear models.

3.3.3 Global Model Interpretability on a Modular Level

How do parts of the model influence predictions?

You might not be able to comprehend a Naive Bayes model with many hundred features, because there is no way you could hold all the feature weights in your brain’s working memory. But you can understand a single weight easily. Not many models are interpretable on a strict parameter level. While global model interpretability is usually out of reach, there is a better chance to understand at least some models on a modular level. In the case of linear models, the interpretable parts are the weights and the distribution of the features, for trees it would be splits (used feature plus the cut-off point) and leaf node predictions. Linear models for example look like they would be perfectly interpretable on a modular level, but the interpretation of a single weight is interlocked with all of the other weights. The interpretation of a single weight always comes with the footnote that the other input features stay at the same value, which is not the case in many real world applications. A linear model predicting the value of a house, which takes into account both the size of the house and the number of rooms might have a negative weight for the rooms feature, which is counter intuitive. But it can happen, because there is already the highly correlated flat size feature and in a market where people prefer bigger rooms, a flat with less rooms might be worth more than a flat with more rooms when both have the same size. The weights only make sense in the context of the other features used in the model. But arguably the weights in a linear model still have better interpretability than the weights of a deep neural network.

3.3.4 Local Interpretability for a Single Prediction

Why did the model make a specific decision for an instance?

You can zoom in on a single instance and examine what kind of prediction the model makes for this input, and why it made this decision. When you look at one example, the local distribution of the target variable might behave more nicely. Locally it might depend only linearly or monotonic on some features rather than having a complex dependence on the features. For example the value of an apartment might not depend linearly on the size. But if you only look at a specific apartment of 100 square meters and check how the price changes by going up and down by 10 square meters, there is a chance that this subregion in your data space is linear. Local explanations can be more accurate compared to global explanations because of this. This book presents methods that can make single predictions more interpretable in the section about model-agnostic methods.

3.3.5 Local Interpretability for a Group of Predictions

Why did the model make specific decisions for a group of instances?

The model predictions for multiple instances can be explained by either using methods for global model interpretability (on a modular level) or single instance explanations. The global methods can be applied by taking the group of instances, pretending it’s the complete dataset, and using the global methods on this subset. The single explanation methods can be used on each instance and listed or aggregated afterwards for the whole group.

Lipton, Zachary C. 2016. “The Mythos of Model Interpretability.” ICML Workshop on Human Interpretability in Machine Learning, no. Whi.↩