A while ago I took human & AI generated text from various sources, including LLMs & submitted it to the OpenAI Classifier. The objective was to gauge the classifier’s ability to detect the origin of text content.

AS seen in the extract below, a paragraph was added to the document which announced the classifier, barely 5 months after launching. This article considers why this type of classification is hard, and how inaccurate it was in the first place.


LLMs are flexible and highly responsive to requests. An LLM can be asked to respond in such a way that the response seems human and not machine written.

An LLM might also be asked to write in such a way to fool an AI detector in believing it's human written, or sound like a particular personality or type.

Hence the system is based on word sequences and choices.

Any stringent approach like watermarking the LLM output somehow by hashing and storing every produced output section, along side generated date and location, then let institutions query this with a final doc based on a geo-code and a time window is completely unfeasible. Also considering the advent of open-source LLMs and the extent to which models can be fine-tuned.

When OpenAI announced and launched a classifier trained to distinguish between AI-written and human-written text, each document submitted was classified into one of five classes:

1. Very unlikely AI-generated,

2. Unlikely AI-generated,

3. Unclear if it is AI-generated,

4. Possibly AI-generated, or

5. Likely AI-generated.

These categories are in itself vague and ambiguous.

OpenAI trained a classifier to differentiate between human and AI written text based on a fine-tuned GPT model. The model was supposed to predict how likely a portion of text was AI generated or not, and from a variety of sources, including ChatGPT.

I made use of AI21Labs, Cohere, text-davinci-003, ChatGPT and other sources to generate text on an arbitrary and ambiguous topic like “punctuality” to test the classifier.

In the table below is an overview of the results, with the source of the text on the left, and the classifier accuracy on the right. The detail of the results are discussed below…

OpenAI clearly stated the following:

Our classifier is not fully reliable.

In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,”

while incorrectly labeling human-written text as AI-written 9% of the time(false positives).

Our classifier’s reliability typically improves as the length of the input text increases.

Compared to our previously released classifier, this new classifier is significantly more reliable on text from more recent AI systems.

Text Generated Via The Cohere LLM

I asked the Cohere LLM the following question:

Write 2,000 characters of text on the importance of being punctual.

Below, the generated text from Cohere is copied into the AI text classifier of OpenAI.

The result from the classifier: likely AI-generated.

Hence correct and full confidence.

Text generated in the Cohere Playground are submitted here to the AI Text Classifier of OpenAI.

Text Generated Via AI21Labs

The same generation command was issued in the AI21Labs playground…asking the AI21Labs LLM to generate text on the importance of punctuality.

The result from the classifier: likely AI-generated.

Hence correct with full confidence.

Text generated in the AI21Labs Playground are submitted here to the AI Text Classifier of OpenAI.


Below you see context generated by ChatGPT…and is rated as possibly by the classifier. Hence being seen one step closer to human generated text as apposed to Cohere and AI21Labs.

I would have expected the classifier to state AI generated with full confidence.

OpenAI text-davinci-003 Model

I also submitted a 500 word text generation by text-davinci-003 on the topic of punctuality and received the same answer from ChatGPT; Possibly AI-generated.

I assumed the classifier would be able to clearly detect text generated on text-davinci-003 or ChatGPT.

An Essay From The Web

I copied a piece from an online essay, and the result from the classifier is ambiguous to some degree, but fairly accurate.

My Own Writing

Below is an original piece I wrote on the same subject, which was marked by OpenAI as possibly AI-generated. I would expect a result of Unclear if it is AI-generated.

But I hasten to add that the piece is short, and as I have said before, the piece is ambiguous with not much definitive text.


Considering that the AI Text Classifier was trained on Wikipedia, I copied a piece from Wikipedia on World War I and asked the classifier to vet the contents. Here I got the right answer, and also the highest ranking of very unlikely.

Can ChatGPT Detect Text Origins?

The short answer is…yes.

The results are definitive, and in my few attempts, very accurate:

And the response on my own writing is also correct.

Keep In Mind

Apart from the accuracy issues stated at the beginning of this article, there are other limitations…

The text and subject I used to premise the writing on is very generic and general. More ambiguous content like this is most probably harder to classify.

The longer the text to be analysed the more reliable the results are.

Human written text are sometimes incorrectly labeled as AI written. So there seems to be a type of a bias towards a default classification of “AI written”.

The classifier is English only and not multilingual.

The classifier is unreliable on classifying code.

AI generated text which is edited by a human can fool the classifier.

The Data

OpenAI collected a dataset of AI-generated and human-written text.

The human-written text has three sources:

In Conclusion

It is evident that the accuracy of the classifier was never reliable, and OpenAI stated this fact openly: “Our classifier is not fully reliable”.

There is immense focus on Responsible AI and I believe that one aspect of responsible AI starts with observability, inspectability and tuning of LLM input / output.

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox