The Chatbot That Lied

Written by Dr David Tena Cucala, Lecturer in Computer Science, Royal Holloway University, 2026

In 2024, an Air Canada customer asked the airline’s chatbot a simple question: could he get a bereavement refund? The chatbot said yes and even explained exactly how to apply. So he bought the ticket, followed the instructions, and submitted the claim.

Air Canada refused. Their defence? The chatbot had made a mistake, as their policy did not allow bereavement refunds.

He took them to the British Columbia Civil Resolution Tribunal. He won. The ruling was beautifully blunt: if you deploy a chatbot to speak for your company, you own what it says.

Even if it hallucinates policies.

The End of the Rulebook

When I tell this story to people outside AI research, the conversation almost always goes the same way. “But someone programmed it, right? Someone wrote the rules?” they say.

And I say: “…not exactly.” Cue stunned silence.

For most of computing history, that assumption was correct. Most early AI systems were rule-based: enormous lists of human-written instructions, variables with sensible names, logic you could trace with your finger. A good engineer could read the code, understand it, and stake their reputation on it.

Then, in the mid-2010s, neural networks took over, and they work almost the opposite way. Instead of writing rules, engineers build systems that learn their own. These networks (loosely inspired by the brain) contain billions of adjustable connections between virtual “neurons.” You feed them examples, give feedback, and repeat the process billions of times. Eventually the system becomes uncannily good at whatever task you are applying it to; sometimes it even becomes superhuman, beating world champions at chess and Go, and making scientific breakthroughs like protein folding. Nobody tells it the rules. It figures them out on its own.

That has two big consequences. First, the system often discovers patterns no human explicitly taught it, including patterns nobody had even noticed before. That’s their genuine magic. But it also means nobody can be completely certain what the system has actually learned. Second, whatever it learned isn’t stored anywhere readable. The knowledge is smeared across millions (sometimes billions) of numerical weights. You can stare at the numbers, but they won’t mean anything to you. People sometimes call AI a “black box”, though the phrase is slightly misleading. The box isn’t opaque. We can look inside. We simply don’t understand what we’re seeing.

The Accountability Gap

That last point has real consequences. It’s why AI systems hallucinate, producing wrong answers with total, unblinking confidence. The Air Canada story is almost funny. But at the other end of the spectrum, you have systems influencing healthcare decisions, shaping legal outcomes, and moving financial markets.

To be fair: we already trust plenty of things we don’t personally understand, like aeroplanes, vaccines, and power grids. But in those cases, someone understands them. Engineers can explain why the plane flies. Scientists know how the vaccine works. There are people whose entire job is to understand these systems deeply, and to be held accountable when they fail. With modern AI, that accountability gap is real, and it’s growing.

A small, serious group of researchers is trying to close it, building something like a “neuroscience of AI”, reverse-engineering these models from first principles to figure out what’s actually happening inside. It’s slow, hard, important work. Meanwhile, the rest of the industry is moving in the opposite direction: build a bigger ship faster, patch the mistakes later.

So here we are, rapidly weaving a technology into education, medicine, finance, and the daily fabric of decision-making, even though nobody fully understands how it works. The question isn’t technical; it’s a choice. If AI is going to shape the infrastructure of society, we can either accept that it remains, at its core, a mystery, or we can demand that someone, somewhere, actually understands it.

But that choice is impossible to make if you don’t know it exists. And why would you? The reasonable assumption is that someone, somewhere, already checked. That there’s an engineer who can open the hood and explain exactly what went wrong.

It’s a fair assumption. It’s just not true.

Where do you stand? Should we slow down deployment until we can explain these systems, or are problems like Air Canada’s chatbot just a “growing pain” we can manage along the way? Drop your thoughts in the Comments.

DOS RHUL

The Chatbot That Lied

The End of the Rulebook

The Accountability Gap

Leave a ReplyCancel reply

The End of the Rulebook

The Accountability Gap

Leave a ReplyCancel reply

Discover more from DOS RHUL