Chapter 3 Interpretability

Throughout the book, I will use this rather simple, yet elegant definition of interpretability from Miller (2017) ³ : Interpretability is the degree to which a human can understand the cause of a decision. Another take is: Interpretability is the degree to which a human can consistently predict the model’s result. The higher the interpretability of a model, the easier it is for someone to comprehend why certain decisions (read: predictions) were made. A model has better interpretability than another model, if its decisions are easier to comprehend for a human than decisions from the second model. I will be using both the terms interpretable and explainable equally. Like Miller (2017), I believe it makes sense to distinguish between the terms interpretability/explainability and explanation. Making a machine learning interpretable can, but does not necessarily have to, imply providing a (human-style) explanation of a prediction. See the section about explanations to learn what we humans see as a good explanation.

Miller, Tim. 2017. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” arXiv Preprint arXiv:1706.07269.↩