We’ve all had that nagging feeling that the content we’re reading might have been penned by a large language model, yet it’s incredibly challenging to confirm. For a short period last year, people widely believed that certain words, such as “delve” or “underscore,” were dead giveaways of AI-generated text. However, the evidence to support this was rather flimsy, and as AI models have advanced and become more sophisticated, these so-called telltale words have become increasingly elusive.

Interestingly, the team at Wikipedia has developed a remarkable knack for identifying AI-authored prose. Their publicly available guide titled “Signs of AI Writing” is, in my opinion, the most comprehensive resource I’ve come across for determining whether your suspicions are justified. (I must give credit to the poet Jameson Fitzpatrick, who brought this document to my attention on X.)

Since 2023, Wikipedia editors have embarked on a mission to manage AI-generated submissions, a project they’ve named Project AI Cleanup. Given that Wikipedia receives millions of edits daily, there’s an abundance of material to analyze. In true Wikipedia-editor fashion, the group has crafted a field guide that is both highly detailed and well-supported by evidence.

To begin with, the guide confirms what many of us already suspected: automated detection tools are largely ineffective. Instead, it focuses on specific writing habits and turns of phrase that are uncommon on Wikipedia but prevalent across the wider internet (and thus, frequently found in the model’s training data). According to the guide, AI submissions tend to spend a significant amount of time highlighting the importance of a subject, often using generic terms like “a pivotal moment” or “a broader movement.” AI models also tend to elaborate extensively on minor media mentions to make the subject appear more notable—something you’d expect in a personal bio rather than from an independent source.

The guide also points out a particularly intriguing quirk related to trailing clauses that make vague claims about importance. Models will often state that an event or detail is “emphasizing the significance” of something or “reflecting the continued relevance” of a general idea. (Grammar enthusiasts will recognize this as the use of the present participle.) It’s a bit tricky to spot at first, but once you get the hang of it, you’ll start noticing it everywhere.

Another common trait is the use of vague marketing language, which is rampant on the internet. Landscapes are always described as scenic, views as breathtaking, and everything is clean and modern. As the Wikipedia editors put it, “it sounds more like the transcript of a TV commercial.”

The guide is definitely worth reading in its entirety, and I was thoroughly impressed by it. Before discovering this resource, I would have argued that the prose generated by large language models was evolving too rapidly to be reliably identified. However, the habits highlighted in the guide are deeply ingrained in the way AI models are trained and deployed. While they can be masked to some extent, it will be difficult to eliminate them entirely. And if the general public becomes more adept at recognizing AI-generated text, it could lead to a host of interesting developments.