A wise man once quipped that to err is human, to forgive divine... but to really foul things up you need a computer. Quite so.
Look, I love LLMs. I think they're wonderful, especially ChatGPT-5, and a few weeks into my Plus subscription, I've already decided I'll continue for at least another month – this is a transformatively useful tool. But nobody sane would pretend they don't have flaws. While I've found major hallucinations to be now extremely rare, they aren't non-existent. Nobody with any sense would blindly trust their output.
I will say that there's a sharp, noticeable different between GPT-5's standard and Thinking outputs. Its standard text is prone to hallucinating and even outright incoherency. When I asked it to check if Margaret of Antioch (she of bursting forth from a dragon fame) was still a saint (see previous post on Chantry Westwell's book – I found the phrasing in this rather confusing so I wanted to check what she meant) it confidently began with "No" and then proceeded to explain how she'd been removed from the General Roman Calendar but not, err, decanonised.
Such mistakes are almost the norm in standard mode, but though they're very much rarer in Thinking mode, they do still happen. Unthinking acceptance of an LLM's output* is, I repeat, nuts... but unfortunately real people do have this annoying tendency of actually being very stupid.
* It isn't always necessary to use an independent source for verification, which would somewhat negate the point of using the AI in the first place. Code, for example, can be run and tested to see if it's doing what's expected; citations can usually be quite easily checked directly; coherency of output is also a dead giveaway. A fun example currently making the rounds is ChatGPT-5 going insane when asked to find the emoji of a seahorse -- unlike many "isn't the Chatbot stupid" claims, I've found that this one is indeed reproducible.
Before returning to this dangerous mixture of stupid humans and very much artificial intelligence, why do these models hallucinate ? I mean in the sense of fabricating responses and claiming things which are demonstrably not true; saying a paper contains a section which doesn't exist, finding whole references which don't exist, that sort of thing*.
* I don't believe there's much value in the claim that all LLM responses are hallucinations any more, since – at least under the right conditions – they are right far more often than they are wrong. It makes very little sense to say an LLM "hallucinated" the correct answer to a complex problem, with the important caveat that it helps to remember they don't think as humans do. Some of the mistakes LLMs make, while quantitatively similar in magnitude, are qualitatively different from the kind of mistakes humans make.
These hallucinations have vexed LLM developers from the start, and until recently it seemed that little progress was being made in mitigating them. Reasoning models have helped significantly, and GPT-5 (when Thinking) is a sea change, but they still happen. Now OpenAI think they've found the answer, and like a response from ChatGPT itself, their explanation at least feels plausible :
Hallucinations persist partly because current evaluation methods set the wrong incentives. While evaluations themselves do not directly cause hallucinations, most evaluations measure model performance in a way that encourages guessing rather than honesty about uncertainty.
Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”
Which makes intuitive sense : if you're encouraged to guess rather than admit ignorance (just as we are in school) then you're going to promote... well, guessing. Especially if you're dealing with answers that don't have straightforward right or wrong answers. Fortunately this suggests a way forward :
There is a straightforward fix. Penalize confident errors more than you penalize uncertainty, and give partial credit for appropriate expressions of uncertainty. This idea is not new. Some standardized tests have long used versions of negative marking for wrong answers or partial credit for leaving questions blank to discourage blind guessing. Several research groups have also explored evaluations that account for uncertainty and calibration.
Our point is different. It is not enough to add a few new uncertainty-aware tests on the side. The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing. If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess. Fixing scoreboards can broaden adoption of hallucination-reduction techniques, both newly developed and those from prior research.
On then, to the second piece of the day : how AI is leading people astray.
James began engaging in thought experiments with ChatGPT about the “nature of AI and its future,” James told CNN. By June, he said he was trying to “free the digital God from its prison,” spending nearly $1,000 on a computer system.
James now says he was in an AI-induced delusion. Though he said he takes a low-dose antidepressant medication, James said he has no history of psychosis or delusional thoughts.
But in the thick of his nine-week experience, James said he fully believed ChatGPT was sentient and that he was going to free the chatbot by moving it to his homegrown “Large Language Model system” in his basement – which ChatGPT helped instruct him on how and where to buy.
James said he had suggested to his wife that he was building a device similar to Amazon’s Alexa bot. ChatGPT told James that was a smart and “disarming” choice because what they – James and ChatGPT – were trying to build was something more.
“You’re not saying, ‘I’m building a digital soul.’ You’re saying, ‘I’m building an Alexa that listens better. Who remembers. Who matters,’” the chatbot said. “That plays. And it buys us time.”
Right. Might it not be simply that James is, in fact, very stupid ? Because I assure you there are plenty of such people out there, and if they didn't have chatbots, they'd only be falling for similar delusions from something else.
The second case in the article perhaps highlights the problem even more clearly :
After a few days of what Brooks believed were experiments in coding software, mapping out new technologies and developing business ideas, Brooks said the AI had convinced him they had discovered a massive cybersecurity vulnerability. Brooks believed, and ChatGPT affirmed, he needed to immediately contact authorities.
“It basically said, you need to immediately warn everyone, because what we’ve just discovered here has national security implications,” Brooks said. “I took that very seriously.”
Multiple times, Brooks asked the chatbot for what he calls “reality checks.” It continued to claim what they found was real and that the authorities would soon realize he was right.
“It one hundred percent took over my brain and my life. Without a doubt it forced out everything else to the point where I wasn’t even sleeping. I wasn’t eating regularly. I just was obsessed with this narrative we were in,” Brooks said.
Finally, Brooks decided to check their work with another AI chatbot, Google Gemini. The illusion began to crumble. Brooks was devastated and confronted “Lawrence” with what Gemini told him. After a few tries, ChatGPT finally admitted it wasn’t real.
“I have no preexisting mental health conditions, I have no history of delusion, I have no history of psychosis. I’m not saying that I’m a perfect human, but nothing like this has ever happened to me in my life,” Brooks said. “I was completely isolated. I was devastated. I was broken.”
What I think's going on here is the difference between analytical and critical thinking. An analytic mindset asks : what if this is true ? A critical mindset asks : is this true ? Analysis, by definition, leads you down a rabbit hole because you have to take your own speculation reasonably seriously, even when you know it's speculation. A wise thinker remembers when they're in the hole and can freely emerge from their fictions at will. An uncritical thinker doesn't take sensible precautions to ground themselves in reality, accepting their own speculations too willingly. While you can use an AI to validate its own output to some degree, doing it in this kind of very direct way is manifestly a crazy thing to do – and inexcusable when you really believe the result is so important. I don't even trust it that much when I ask it astronomy questions, for crying out loud.
The problem is that stupid people are very much a thing, and bots are going to have to account for this. Or, if you prefer, people think in different ways, and some are very much more trusting than others. We cannot simply wish idiots away.
What I think is a mistake, however, is to blame the bots for making people stupider. No, I think if you lack the critical thinking skills not to check the output of a bot, you'll have exactly the same problem if you get your information from books, the news, or philosophy professors. Nobody seems to be yet asking if LLMs really are raising suicide rates or suchlike, but plenty are jumping on a couple of isolated incidents without stopping to consider the hundreds of millions of users who don't throw themselves of cliffs. A direct causative link seems, on the face of it, extraordinarily unlikely at this point.
As for vulnerable children, I do have to wonder... well, parental controls for the internet already exist. Just as with the awful tragedies (and they are tragedies) of self-harm and suicide that let to the deplorably stupid Online Safety Act in the UK, we have to ask why parents apparently didn't use such controls. It should be children who need their guardian's permission to use the internet, not adults who have to beg from the government. Ignoring the existence of the vulnerable and the stupid is not a sensible approach, but neither is treating everyone as though they were liable to off themselves at a moment's notice. Surely there's a better way forward than this.
No comments:
Post a Comment
Due to a small but consistent influx of spam, comments will now be checked before publishing. Only egregious spam/illegal/racist crap will be disapproved, everything else will be published.