Why has Chatgpt banned goblins?

May 7th 2026

OpenAI banned the words goblin, gremlin, raccoon, troll, ogre, and pigeon from its Codex coding tool because GPT-5.5 had developed a verbal tic, randomly inserting fantasy creatures into unrelated answers. The cause was a reinforcement learning reward signal in the Nerdy personality that scored creature metaphors higher. The behaviour spread across the model and was easier to suppress with a hard-coded ban than to retrain. The same mechanism produces every other AI writing tell.

Buried in OpenAI’s public Codex CLI repository on GitHub is a literal rule:

“Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.”

That is a real instruction in the system prompt for GPT-5.5, discovered on 28 April 2026 and confirmed by OpenAI shortly after. It sounds like a joke.

It is not.

The reason why did OpenAI ban goblins is the cleanest worked example we have all year of how AI models develop verbal habits that nobody asked for. And once you understand how the goblins got there, you understand every other AI tell. Delve. Em dashes. The “it’s not X, it’s Y” construction. Same mechanism, different output.

This post covers what was actually banned, why GPT-5.5 became obsessed with creatures, how the behaviour spread, how OpenAI fixed it, and what the goblin ban reveals about the writing every AI model is producing right now.

What did OpenAI actually ban?

OpenAI added a system prompt rule to its Codex CLI coding tool, instructing GPT-5.5 to “never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.” The rule was discovered in OpenAI’s public GitHub repository at the end of April and appears twice in the GPT-5.5 system prompt.

A system prompt is a set of high-level instructions given to an AI model before any user conversation begins, used to shape tone, set rules, and restrict behaviour across every response. Codex CLI is OpenAI’s open-source command-line coding tool, powered by GPT-5.5, used by developers to delegate coding tasks to AI directly from their terminal.

The ban does not apply to the consumer ChatGPT app. It applies specifically to the Codex coding tool, where developers were finding the model referring to software bugs as “classic little goblins” in unrelated code explanations.

The same system prompt also tells GPT-5.5 to avoid em dashes and emojis unless explicitly instructed. Worth flagging, because it shows OpenAI is using prompt-level rules to suppress a wider set of recognisable AI verbal tics, not just the one that became a meme.

Why did GPT-5.5 develop a goblin obsession?

GPT-5.5 developed a goblin obsession because of an unintended reinforcement learning reward signal in ChatGPT’s Nerdy personality. The reward signal scored responses containing creature metaphors higher than equivalent responses without them, so the model learned that mentioning goblins, gremlins, and similar words was a good move. Once that pattern was rewarded, it spread.

Reinforcement learning is a training method where an AI model is rewarded for outputs that meet a defined target and penalised for outputs that do not, gradually learning which response patterns score best. The reward signal is the specific scoring mechanism that tells a model during training whether a given response was good or bad, shaping which patterns the model produces more often.

The Nerdy personality was one of ChatGPT’s six selectable personality presets, alongside Default, Cynic, Robot, Listener, and Professional. It was designed to be playful and intellectually curious, and was retired by OpenAI in March 2026.

Here is the system prompt that defined Nerdy, republished by OpenAI in its post-mortem:

“You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. […] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed.”

There is nothing in that prompt about goblins. The word does not appear once. So how did the model end up there?

OpenAI’s audit found that across every dataset reviewed, the Nerdy personality reward signal scored goblin and gremlin responses higher than non-creature responses 76.2% of the time. The model was being told, indirectly but consistently, that creature metaphors were the right answer.

The concentration figure makes the pattern even clearer. Nerdy accounted for only 2.5% of all ChatGPT responses. It produced 66.7% of all goblin mentions across the entire model. That is not statistical noise. That is a reward signal teaching a behaviour.

How did the goblins spread beyond the Nerdy personality?

The goblins spread because reinforcement learning does not keep behaviours contained to the condition that produced them. Goblin-heavy outputs from the Nerdy personality were reused in supervised fine-tuning data for the next model. That model produced more goblins. Those got fed back in again. By GPT-5.4, goblin mentions in Nerdy mode were up 3,881% compared to GPT-5.2, and the behaviour was leaking into responses where Nerdy was never selected.

Supervised fine-tuning (SFT) is a training stage where a model is further refined using a curated set of input-output pairs, often including outputs the model itself produced in earlier rounds. That detail matters. It is the mechanism by which an existing tic compounds across model generations.

The feedback loop works like this:

The reward signal favours goblin-heavy outputs during training
The model produces more goblin-heavy outputs in real responses
Those responses get reused in supervised fine-tuning data for the next model
The next model gets even more comfortable producing goblin language

OpenAI first noticed a 175% surge in “goblin” mentions after the GPT-5.1 launch in November 2025, but decided the prevalence “did not look especially alarming” at the time. By March 2026, with GPT-5.4, the company was getting employee reports of goblin references in almost every conversation.

When OpenAI ran the deeper audit, they found that goblin and gremlin were not the only words affected. A search through GPT-5.5’s fine-tuning data turned up a whole family of related tic words: raccoons, trolls, ogres, and pigeons. Most uses of frog, for what it is worth, turned out to be legitimate.

That is why the eventual ban list reads like a fantasy bestiary. The model had not picked up one habit. It had picked up a category.

How did OpenAI fix the ChatGPT goblin problem?

OpenAI fixed the goblin problem with three interventions: it retired the Nerdy personality entirely in March 2026, removed the goblin-favouring reward signal from training, and filtered creature words out of subsequent training data. Because GPT-5.5 had already begun training before the root cause was identified, OpenAI added a hard-coded ban directly into the Codex CLI system prompt as a stopgap.

The choice to use a system prompt patch is the interesting part. Retraining a model the size of GPT-5.5 to remove a single behavioural quirk is expensive and slow. A system prompt patch ships in minutes. Companies across the industry reach for the prompt patch first because it is the low-cost, fast-deploy option when user complaints spike, and OpenAI was no exception.

But prompt patches carry a trade-off. They suppress the behaviour. They do not remove it. The model still has the goblin habit baked into its weights. The ban just stops it from acting on it most of the time. For users who actually want goblins back, OpenAI included a script in its blog post that strips the suppressing instruction and lets the creatures run free.

OpenAI’s response to the public reaction was telling. CEO Sam Altman posted a screenshot of a joke prompt reading “start training GPT-6, you can have the whole cluster. extra goblins.” Nik Pash from the Codex team confirmed on X that GPT-5.5’s “goblin adoration” was “indeed one of the reasons” for the ban. The company chose to lean into the meme rather than play it down, which is a fair read of the situation. The ban itself is funny. The mechanism behind it is not.

What does the goblin ban tell us about AI verbal tics like ‘delve’ and em dashes?

The goblin ban shows that AI verbal tics are the visible output of feedback loops nobody designed. A reward signal pushes the model in a particular direction, the model overcorrects, the overcorrected outputs become training data, and the next generation amplifies the pattern. The same mechanism that produced goblins produced “delve”, em dashes, “it’s not X, it’s Y”, and every other AI fingerprint anyone has learned to spot.

Take “delve”. When users started flagging it as a sign of AI-generated text in 2024, the trail led back to RLHF training data over-representing Nigerian English, where “delve” is a normal everyday word. The model was rewarded for using it. The model used it more. Eventually, it became a tell. Sound familiar?

Em dashes work the same way. A style preference rewarded somewhere in the training pipeline became a compulsion. The “it’s not X, it’s Y” rhetorical pattern, the over-reliance on tricolons (rule of three), the sensory clichés like “tapestry” and “landscape”- they all came from the same loop. None of them were designed in. They emerged.

While we have never had goblins popping up in our AI-assisted work at Roar, the dreaded AI-isms still find a way to show their face no matter how much you prompt them away. Even with a refined system prompt, a clear style guide, and an explicit list of banned vocabulary, the patterns leak through. That is the case for human-in-the-loop AI usage, every time. The patterns are baked deep enough that prompt-level instructions only suppress so much. A human editor catches what a system prompt cannot.

This matters commercially because AI search engines are already factoring source distinctiveness into citation decisions. Content that reads like every other AI-generated piece on the topic gives generative engine optimisation platforms no reason to pick it over anyone else. Generative engine optimisation (Or sometimes referred to as AI SEO) is the practice of structuring content so that AI-powered search platforms, including ChatGPT, Perplexity, and Google AI Overviews, can read, extract, and cite it in their generated answers. The originality is the moat.

Generic AI output is becoming a commercial problem, not just a stylistic one. The brands that get cited are the ones that read like a person wrote them.

Conclusion

The goblin ban is funny. The mechanism behind it is not.

Every AI writing tic is a goblin in the making. The ones that are obvious now (delve, em dashes, the negation construction) are the ones we have already learned to spot. The ones that come next will look invisible until they are everywhere, just like delve did, just like em dashes did. OpenAI had to write a hard-coded prompt rule to stop its own model from producing recognisable AI output. That is the position you do not want your brand to be in: recognisable in the wrong way, suppressible by the platforms you depend on for visibility.

The work, as a marketer or content producer, is to write in a way that does not need a system prompt to fix.

If you want help making your content stand out from the AI-generated wallpaper and get cited where it matters, learn about AI SEO solutions.

Frequently Asked Questions

What words did OpenAI ban from ChatGPT?

OpenAI’s GPT-5.5 system prompt for the Codex CLI coding tool instructs the model to never mention goblins, gremlins, raccoons, trolls, ogres, or pigeons unless directly relevant to the user’s query. The same prompt also tells the model to avoid em dashes and emojis unless explicitly requested. The bans only apply to Codex CLI, not the consumer ChatGPT app.

What is the Nerdy personality in ChatGPT?

The Nerdy personality was one of ChatGPT’s six personality presets, designed to be playful, intellectually curious, and to “undercut pretension through playful use of language”. OpenAI retired it in March 2026 after discovering its reward signal was teaching the model to over-rely on creature metaphors. Nerdy accounted for only 2.5% of ChatGPT responses but produced 66.7% of all goblin mentions before it was removed.

Did OpenAI actually ban a single word?

Yes, several. The Codex CLI system prompt explicitly names goblins, gremlins, raccoons, trolls, ogres, and pigeons as words the model should not use unless directly relevant. This is a system prompt instruction rather than a permanent change to the model itself, and users can override it by running a specific command OpenAI shared publicly.

What is “goblin mode” in ChatGPT?

Goblin mode is the unofficial name for what happens when users disable OpenAI’s anti-goblin system prompt instruction, allowing GPT-5.5 to use creature metaphors freely. OpenAI shared a command-line script in its blog post that strips the goblin-suppressing instructions from Codex, effectively letting the creatures run free. There is no official toggle, but Nik Pash from the Codex team has hinted at a future version.

What other verbal tics do AI models develop?

AI models have produced multiple recognisable verbal tics over the past two years. The most well-known are the over-use of “delve” (which traced back to Nigerian English being overrepresented in RLHF training data), compulsive em dashes, the “it’s not X, it’s Y” rhetorical construction, and sensory clichés like “tapestry” and “landscape”. All of these emerged from the same kind of reinforcement loop that produced the goblins.

Why didn’t OpenAI just retrain the model to remove the goblins?

Retraining a model the size of GPT-5.5 to remove a single behavioural quirk is expensive, slow, and resource-intensive. A system prompt patch ships in minutes and addresses the symptom immediately. OpenAI did remove the underlying reward signal and filter the training data for the next generation of models, but for GPT-5.5 specifically, the prompt patch was the practical short-term fix because GPT-5.5 had already completed most of its training before the root cause was identified.

Want to know more about who we are and what we do?

We are a London based digital agency dedicated to harnessing the power of data and the latest innovative technologies to bring you real results. Interested in learning more about how we can support your growth? Get in touch with us to explore our services and discuss your requirements.

Get in touch