Interpretable Machine Learning

2.1 Storytime

Get started with a few short stories. Each story is an - admittedly exaggerated - call for interpretable machine learning. If you are in a hurry, you can skip the stories. If you want to be entertained and (de-)motivated, read on!

The format is inspired by Jack Clark’s Tech Tales in his Import AI Newsletter. If you like these kind of stories or are interested in AI, I recommend signing up.

Lightning Never Strikes Twice

2030: A medical lab in Switzerland

“It’s definitely not the worst way to die!” Tom summarised, trying to find something positive in the tragedy. He was removing the pump’s computer from the intravenous pole. “He just died for the wrong reasons.” Lena added. “And certainly with the wrong morphine pump! Just creating more work for us!” Tom complained, while he was unscrewing the pump’s back plate. After he had removed all the screws, he lifted the plate and put it aside. He plugged a cable into the diagnostic port. “You didn’t just complain about having a job, did you?” Lena gave him a mocking smile. “Of course not. Never!” he exclaimed with an ironic undertone.

He booted the pump. Lena plugged the other end of the cable into her tablet. “Alright, diagnostics are running.” she announced. “I am really curious to see what went wrong.” “It certainly shot our John Doe to Nirvana. This high concentration of this morphine stuff. Man. I mean … that’s a first, right? Normally a broken pump gives too little of the sweet stuff or nothing. But never, you know, like this crazy shot.” Tom explained. “I know. You don’t have to convince me … Hey, look at that.” Lena held up the tablet. “Do you see this peek here? That’s the painkiller potency. Look: this line shows the reference level. The poor guy had a painkiller mix in his blood system that could kill him 17 times over. All injected by our pump here. And here …” she swiped, “Here you can see the moment of the patient’s demise.” “So, any idea what happened boss?” Tom asked his supervisor. “Hm … The sensory functions seem to be okay. Heart rate, oxygen levels, glucose, … The data were collected as expected. Some missing values in the blood oxygen data, but that’s not unusual. Look. It also picked up the slowing heart rate of the patient and the extremely low cortisol levels caused by the morphine derivate and the other pain blocking agents.” She continued swiping through the diagnostics. Tom stared in a gaze at the screen. It was his first real device failure to investigate.

“Ok, here is our first piece of the puzzle. The system failed to send a warning to the hospital communication channel. The warning was triggered, but rejected on the protocol level. Might be our fault, but could be also the hospitals fault. Please send the logs over to the IT team.” Lena told Tom. Tom nodded, eyes still locked on the screen. Lena continued. “It’s weird. The warning should also have triggered the shutdown of the pump. But it obviously failed to do so. That must be a bug. Something the quality team missed. Something really bad. Maybe connected to the protocol issue.” “So, the pump’s emergency systems somehow broke, but why did the pump go full bananas and inject so much painkiller?” Tom wondered. “Good question. You are right. Protocol emergency failure aside, the pump shouldn’t have administered this amount of medication in the first place. The algorithm should have stopped much earlier on its own, given the low cortisol level and other warning signs.” Lena explained. “Maybe some bad luck, like a one in a million thing, like being hit by a lightning?” Tom asked her. “No Tom. If you would have read the documentation, which I sent to you, you would know that the pump was first trained on animal trials, later on humans to learn on its own how to find the perfect amount of pain killer, given the sensory input. The pump’s algorithm might be opaque and complex, but it is not random. That means the pump would show the same behaviour in the exact same situation. Our patient would die again. Some combination or unwanted interaction of the sensory inputs must have triggered the erratical behaviour of the pump. That’s why we have to dig deeper and find out what happened here.” Lena explained.

“I see …” Tom responded, lost in thoughts. “Wasn’t the patient going to die soon anyway? Because of cancer or something?” Lena nodded while reading the analysis report. Tom got up, and walked to the window. He looked outside, eyes fixating on some point in the distance. “Maybe the machine did him a favour, you know, like freeing him from the pain. No more suffering. Maybe it just did the right thing. Like a lightning, but, you know, a good one. I mean like the lottery, but not random. For a reason. If I were the pump, I would have done the same.” She finally lifted her head and looked at him. He continued looking at something outside. Nobody said anything for a minute or two. Lena lowered her head again and continued the analysis. “No Tom. It’s a bug… Just some goddam bug”.

Trust Fall

2050, A subway station in Singapore

She was rushing to Bishan subway station. With her thoughts she was already at work. The tests for the new neural architecture should have finished by now. She lead the re-design of the government’s “tax affinity prediction system for individual entities”, which predicts if an individual will hide money from the tax office. Her team came up with an elegant piece of engineering. If successful, the system would not only serve the tax office, but also feed into other systems, like the anti-terrorist defence and the trade registry. One day, the government might even integrate it into the civic trust score. The trust system estimates how trustworthy an individual is. The estimate affects every part of your daily life, like getting a loan or how long you have to wait when getting a new passport. Descending the escalator, she imagined how an integration into the current trust score system could look like.

Routinely, she wiped her hand over the RFID reader without reducing her walking speed. Her mind was occupied, yet a dissonance of sensory expectations and reality rang alarm bells in her brain.

Too late.

Nose first she ran into the subway entrance gate and fell, bottom first, onto the floor. The door was supposed to open, … but it didn’t. Baffled, she stood up and looked at the gate’s screen. “Please try again some other time.” it suggested in friendly colours. A person walked by, and, ignoring her, wiped his hand over the reader. The doors opened and he walked through. The doors closed again. She wiped her nose. It hurt, but at least it wasn’t bleeding. She tried to open the door, but got rejected again. It was odd. Maybe her public transport account did not have sufficient tokens. She raised her watch to check the account balance.

“Login denied. Please contact your Citizens Advice Bureau!” the watch informed her.

A feeling of nausea hit her like a fist to the stomach. With trembling hands she started the mobile game “Sniper Guild”, an ego-shooter. After a few seconds, the loading screen shut down. She felt dizzy and sat down on the floor again.

There was only one possible explanation: Her trust score had dropped. Substantially. A small drop meant minor inconveniences, like not getting first class flights. A low trust score was rare and meant that you were classified as a threat to society. One measure for dealing with those people was to keep them from public places - for example the subway. The government restricted financial transactions of low-trust subjects. They also started actively monitoring of your behaviour on social media, even going as far as to restrict certain contents, like violent games. It became exponentially more difficult to increase your trust score, the lower it was. People with a very low trust score usually never recovered.

She could think of no reason why her score should have dropped. The score was based on machine learning. It worked like a well oiled engine, stabilising society. The performance of the trust score system was always closely monitored. Machine learning had become a lot better since the beginning of the century. It had become so efficient that decisions made by the trust score system could not be disputed. An infallible system.

She laughed hysterically. Infallible system. If only. The system failed rarely. But it did fail. She was an edge case; an error of the system; from now on, an outcast. Nobody dared to question the system. It had become too integrated into the government, into society itself to be questioned. In democratic countries it was forbidden to form anti-democratic movements, not because they where inherently vicious, but because they would de-stabilise the current system. The same logic applied to algorithmic critique: Critique in the algorithms was forbidden, because of its danger to the status quo.

The algorithmic trust was the very fabric the societal order was made of. For the greater good, rare incorrect trust scorings were accepted silently. Hundreds of other prediction systems and databases were feeding into the score, so it was impossible to know what triggered the drop in her score. Wild emotions twisted her, most of all, terror. She vomited on the floor.

Her tax affinity system was eventually integrated into the trust system, but she never got to know.

Fermi’s Paperclips

Year 612 AMS (after mars settlement): A museum on Mars

“History is boring” Xola whispered to her friend. Xola, a blue-haired girl, was lazily chasing with her left hand one of the projector drones, that were buzzing in the room. “History is important!” the teacher said with an upset voice, looking at the girls. Xola blushed. She hadn’t expected her teacher to overhear her.

“Xola, what did you just learn?” the teacher asked her. “That the ancient people used all resources from earther planet and then they died?” she asked carefully. “No. They made the climate go hot and it wasn’t the people, it was the computers and machines. And it’s planet earth, not earther planet.” another girl named Lin added. Xola nodded in agreement. With a hint of a pride smile, the teacher nodded. “You are both right. Do you know why it happened?” “Because the humans were short-sighted and greedy?” Xola wondered. “The humans couldn’t stop their machines!” Lin blurted out.

“Again, you are both right!” the teacher resolved, “But it’s a lot more complicated than that. Most people at the time were not aware what was happening. Some were seeing the drastic changes but could not revert them. The most famous piece from that time is a poem, by an anonymous author. It captures best what happened at that time. Listen carefully!”

The teacher started the poem. A dozen of the little drones repositioned themselves in front of the kids’ eyes and started projecting the video. It showed a person in a suit, standing in a forest but with only tree stumps left. He started reciting:

The machines compute; the machines predict.

We march on, as we are part of it.

We chase an optimum as trained.

The optimum is one-dimensional, local and unconstrained.

Silicon and flesh, chasing exponentiality.

Growth is our mentality.

When all rewards are collected,

and side-effects neglected;

When all coins are mined,

and nature has fallen behind;

There will be trouble,

after all, exponential growth is a bubble.

The tragedy of the commons unfolding,

exploding,

before our eyes.

Cold computing and icy greed,

fill the earth with heat.

Everything is dying,

And we are complying.

Like horses with blinders we race the race of our own creation,

towards the Great Filter of civilisation.

And so we march on, relentlessly.

As we are part of the machine.

Embracing entropy.

“A grim reminder.” the teacher broke the silence in the room, “It’s uploaded to your library. Your homework is to memorise it until next week.” Xola sighed. She managed to catch one of the little drones. The drone was warm from the CPU and the motors. Xola liked how it warmed her hands.