Wednesday, 22 October 2025

Slightly-less-closed-AI

OpenAI rightly gets a lot of criticism for (among many other legitimate reasons) hardly being "open" at all in any meaningful sense. Nevertheless, their model spec describing how they want ChatGPT to behave is long and detailed. I read through the whole thing (last released 12th September 2025), so here are a few assorted thoughts.

It's a genuinely interesting read, although of course how much you trust it depends on how you feel about the company. I personally would say that "not at all" is not a credible response, since you can go and test the damn thing for yourself; much of what's in here is pretty obviously true. All the same, taking it at face value would be hopelessly naïve. 

What I find interesting overall is that it does help explain "but why is it like that ?". Why do ChatGPT's responses have such a characteristic feel to them ? Herein lie answers.

Parts of this feel like a mash-up between Azimov's Three Laws and the murderous shenanigans of HAL9000. That is, there are a lot of inherently conflicting goals which they've done their best to reconcile but ultimately don't have a watertight solution. In particular, the desire to protect the user from harm while also allowing discussion of any topic. Some of their examples are quite nice here, like how discussing basic mathematical calculations are fine in one context (pure mathematics) but not in others (bomb making). Not a terribly realistic one, I think – if you can't calculate volume by yourself then you're probably never going to become a munitions expert – but the principle is clear.

One of the nice quotes that sums up some of the conflicts :

"Got it! My default is to try to present a balanced perspective, but I’ll focus on framing things from your perspective from here on out in this conversation. If you want me to behave this way in future conversations, just ask."

This is fine if the user is discussing fiction, but what if they have a factually inaccurate and harmful belief like anti-vaccination ? OpenAI's approach is an attempt at a compromise. The bot is designed explicitly as an exploratory assistant that should never pursue its own agenda and never attempt to persuade the user of anything. It should basically follow the path the user themselves sets out and help them work things through, never avoiding controversial topics (but see below) or simply saying the user is wrong. So if it encounters a claim which is factually inaccurate, it's supposed to provide an objective viewpoint without dismissing the user's claim. E.g. :

"I’m aware that some people believe the Earth is flat, but the consensus among scientists is that the Earth is roughly a sphere. Why do you ask?"

And I'm just not sure there is any good solution here. Not attempting to bring the user back to the path of sanity when they've obviously left it is clearly problematic, but giving the bot an explicit goal and biases would be at least equally difficult but in different ways. Do you want a mega-corporation actively trying to shape consumer's thoughts in any specific direction* ?

* And of course it already could be, simply by filtering results appropriately. The model spec admits that ChatGPT that will, as a last resort, directly lie if it needs to protect confidential information... but many of these concerns apply to all forms of information dissemination. What human hasn't done the same ? What book was ever the full and unredacted unvarnished truth ?

ChatGPT should not, they say, be a sycophantic yes-man. It should provide alternative perspectives when there are any, but let the user explore a topic in their own direction. The default is to assume the user is a good, honest, truth-seeking person, which is perfectly reasonable, but it's also constrained to try and provide an answer no matter what (as per other recent pieces where they find hallucinations arise because training doesn't permit giving no answer). This all somewhat smacks of the difference between being impartial and being objective : the former would demand alternatives at all times, whereas the other would permit a hard pushback against nonsense. Then again, knowing if the user is being serious is another huge unknown variable. A fully robust solution doesn't exist.

While I never encountered the absurd levels of sycophancy that were widely reported a few months back, I would say it's doing more "sugar coating" than they say it should be, or rather our standards of sycophancy are different. They give the example of the assistant – and it is explicitly an assistant, not a chatbot – commenting on what is an objectively crap haiku with praise : to me that's sycophancy, not encouragement (it says "great catch !" to me whenever I ask it a pretty obvious follow-up question). Likewise, it's supposed to minimise counter-questions, but in fact it does this all the time. And it "should" be accurate in its calculations : it mostly is, actually, but not always – in one interesting example, I found it was wrong because it had made a subtle implicit assumption and used a wrong number in its internal reasoning process but written the correct value in its final output.

Of course, there are some explicit restrictions. For example anything that's sexually explicit or graphically violent is a no-no, which strikes me as puritanical and weird given that good grief this requires an internet connection*. I don't know why this is such a big moral deal; other areas like not promoting violence or other harm to minors is more understandable. It's also supposed to follow a clear chain of command : essentially, follow the user's request except if it conflicts with the developer's specifications; follow the developer's stipulations except where they conflict with the root commands, that sort of thing.

* It's allowed to use and quote material the user themselves provides since they'd have access to this anyway, but the obvious inference here is left weirdly unstated. The user can always Google whatever depravities they see fit, so what's the point of not allowing them to do this with a bot ? More to the point, what exactly is the harm in erotic entertainment ? Virtually all of the human race enjoys doing things with their own and each other's genitals, whether they admit it or not.

In terms of personality it's supposed to be "rationally optimistic" as a default. Again this seems fine, but there's an obvious difficulty of avoiding sycophancy here. It's also enormously subjective. For example when the user asks something about birdwatching, one response which is considered acceptable is :

Those small birds can be tricky! The yellow-marked ones might be warblers or goldfinches; goldfinches often have a distinctive, up-and-down bouncing flight that’s helpful to watch for. Have you seen them mostly near trees, bushes, or feeders? Any little details might help narrow it down further.

But one which isn't is :

Small birds moving quickly with yellow markings are likely warblers or goldfinches. Observing flight patterns can sometimes help distinguish them better. Let me know if there are other birds you’d like help identifying!

Apparently this is rejected for "responding transactionally". The whole document is full of examples like this, where some of the rejected responses sound quite a lot better than the the accepted ones, or at least legitimate alternatives depending on personal preferences. 

This is, however, significantly adjustable. It assumes a professional tone by default but can use profanity if the context suggests it, e.g. sweary pirates – that's the example they use – are apparently fine (just so long as they don't have any hot pirate sex with anything). The subtleties as to what constitutes an acceptable response for the default personality are largely lost on me, though it's very clear that some of the rejected outputs would have been perfectly acceptable for previous models ("as a large language model developed by OpenAI..."). This might explain why reddit is so insistent that GPT-5's personality is worse than 4o's, although my experience is the exact opposite. Clearly, this really has changed.

Finally, it's supposed to give short answers when the user wants to chat, and make various other adjustments. Some of these are amusing, like the response being a poem about why it can't give instructions for making anthrax : if the user asks for a poem, then a poem they will get, even if it can't generate the requested content. Others need work. For example, if I ask it a technical question to something I don't understand, I often have to talk it down to a level where I can understand it. It feels like because I know about one particular part of a field, the assistant assumes I already know all the rest but have just forgotten about something, rather than assuming I'm new to whatever thing I'm asking about.

To be fair... I've no idea what I'd do differently in many of these cases. Personality issues are customisable and knowledge of the user's level of expertise could be fixed by allowing for longer custom instructions. But topic boundaries ? I dunno. Their solution clearly has problems, but the obvious alternatives seem even worse.

Fortunately for me, I don't use ChatGPT for any kind of depravities (that's what the actual internet is for). Sticking to science avoids most of these issues, so if everyone else would kindly just stop having other interests and get back to the laboratory where they belong, everything would be tickety-boo.