AI Struggles to Detect False Information Because Finding Truth is a Word Problem, Not a Math Problem

Read Time: 15 minutes

TL;DR (aka Executive Summary)

  • Text-Generative AI often produces content that is false, because fundamentally, it is a machine performing probability calculations to determine the order of words in a sentence. It is optimized to sound human, but not optimized to generate only true statements.
  • It is difficult to optimize AI to differentiate between true and false statements because there isn’t a “source of truth” in its training data. More precisely, there aren’t any large data sets of text labeled for degrees of truth. It is likely that if such labeled data sets existed, they could be used to train generative AI models and improve them.
  • There are many reasons why large data sets of text labeled for truth do not exist. Among them are:

1) Truth and falsehood aren’t binary

2) What’s “true” changes

Labeling for truth is inherently difficult work that is highly philosophical in nature. It’s a word problem, not a math problem.

  • Generative AI will likely always need human assistance regarding questions of truth because:

1) Misinformation is usually about disputed and/or new information

2) All the information on the internet isn’t all the information everywhere

  • Even though labeling for truth is difficult, it is probably a necessary approach to improving the capabilities of current generative AI. We should try our best, because we have a collective responsibility to make sure the benefits of any new technology outweigh the drawbacks.


By now, you’ve seen how OpenAI’s ChatGPT can generate text answers to prompts that sound like humans wrote them. It’s one of the most impressive implementations of generative AI to date in a field full of impressive implementations. For those unfamiliar, “generative AI” broadly refers to AI tools that can generate a piece of work, such as a written text (like ChatGPT, Google’s just-announced Bard and Meta’s Galactica) or a drawing or piece of art (like DALL-E and Lensa).

One concern about generative AI for text is that it confidently writes false and nonsensical answers. AI practitioners call these “hallucinations,” examples of which have been proliferating over the last few weeks. Among the many examples are Galactica’s response detailing the history of space exploration by bears, and ChatGPT’s responses that elephants lay the largest eggs of any mammal and that four different fictional people hold the world record for walking across the English channel.

The brilliant creators of these tools know about these limitations and acknowledge that it is a difficult problem to solve. However, the difficulty of detecting false or misleading information in text is nothing new. For years, social media companies have struggled to use predictive AI to detect false and misleading content that users post online. Some predictive AI does a good job of identifying other sorts of problematic content, like pornography, nudity, spam, and financial scams. However, false or misleading information detection is notoriously difficult for algorithms to identify. Any technologist who has approached the problem acknowledges how hard a problem this is to solve.

The technological evolution from predictive AI to generative AI is, in many ways, a quantum leap forward, but the issue with truth remains. Although ChatGPT’s capabilities are mesmerizing, fundamentally it is still a computer doing probability calculations on combinations of words. It’s doing math on text.
The difference between a math problem and a word problem, as we all learned taking tests in school, is that math problems have discrete answers, while word problems require written responses that show reasoning. Math problems ask “what.” Word problems ask “why” and “how.”

Text-generative AI systems calculate an answer to “what combination of words would produce the most satisfactory answer to a human?” That’s already a difficult problem for a computer to solve, and ChatGPT has done it. With math.

Unsupervised learning within Large Language Models (LLMs) is the process of feeding extremely large sets of inputs (specifically, billions of pieces of text written by humans) into a database and then using statistical computations to calculate relationships between the inputs. This is doing math on words. What’s so impressive and mind-boggling about AI technology is that the mathematical formulas can take the data inputs and find correlations and relationships between sets of words much faster and at a larger scale than a human could. Then, it can put those correlations and relationships back into the original mathematical formulas and get better over time by itself! It does math on math on text.

But it’s not that OpenAI uses only math. The best AI systems combine some amount of human logic, reasoning, and experience on top of the unsupervised learning to train itself. This method–using human data labeling to train a machine learning model–is known as supervised learning. It’s what we do when we click on boxes to identify traffic lights and crosswalks to prove we are not robots. OpenAI uses teams of humans (who have logic and reasoning) to rate thousands of ChatGPT’s responses and rank which ones sounded more human, which helps the machine learn to produce more human-like answers. Therefore, ChatGPT uses supervised learning (which isn’t solely math) to improve the unsupervised learning (the math).

Answering the question “what combination of words would produce the most satisfactory answer to a human” is difficult. But it is much, much harder to add to that question “what combination of words would produce the most satisfactory answer to a human that is true.” Without that truth requirement, the machine produces hallucinations.

It’s hard to solve because the answer to “what’s true” isn’t always as definite as the answer to a math problem. It requires a lot of “why” and “how” reasoning in the background. AI does well with things that can be definitely classified as binary, like “this is or is not a traffic light, this is or is not a limerick, or this is or is not a dog.” It also does well with things that can be quantified or are discrete, like the probability of the next word that should appear, the price of a house, or what color something is.

Answers to the question “what is true” for any given subject matter are very hard (though not impossible) to classify and quantify discretely. “What is true” quickly turns away from math into philosophy. It’s liberal arts, not STEM. Plato, rather than Gauss. A word problem, not a math problem.

We can see one hint that figuring out what is and isn’t true is really, really hard in the fact that many technologists—actually, many people of all types—find it uncomfortable to wade into this territory. Many technologists (e.g., Mark Zuckerberg and Jack Dorsey) explicitly state that they don’t want to be “the arbiters of truth.” Solving the truth problem isn’t the top priority for text-generative AI creators either. I don’t blame them. They did all this work solving these super hard math problems, and now we want them to solve for truth, too?

But because their mathematical creations—their AI-powered super applications—quickly gain so much scale and adoption, it is fair for us to place some responsibility at their feet for not spreading misinformation and disinformation. It may be hard to be an arbiter of truth, but some things are more true and false than others. It is important to try to differentiate between them.

I believe that it is possible for predictive and generative AI to learn to get better at distinguishing degrees of truth, but not without some hard work by humans taking on the truth question head-on over the next several years, and likely not without humans-in-the-loop indefinitely. Specifically, I posit that it would be possible to improve AI models significantly by implementing supervised learning on large datasets labeled for measures of truth.

At Ad Fontes, we regularly dive into the question of how to tell what’s true in text, and we are working on closing the gap between the respective abilities of humans and machines to evaluate a piece of text for truth. Based on our experience, here are some thoughts about why AI struggles with questions of truth and what it will take to improve its truth-discerning capabilities.

The core of the problem: While there are nearly infinite amounts of text on which to train an LLM, there are hardly any text data sets that categorize how true or false those texts are. At Ad Fontes, we currently have the largest such labeled data set that exists at about 45,000 pieces of news and informational content (articles, episodes) hand-labeled by humans (each labeled by at least three humans balanced across left, right, and center political leanings and diverse across age, race, and gender).

Having done this work, we know that labeling for truth is hard for at least the following reasons:

1. Truth and falsehood aren’t binary.

Typically, when people first approach the problem of distinguishing what’s true and false, their minds turn to discrete facts that can be confidently and definitely classified as “true” or “false,” like “George Washington was the first President of the United States” (true) or “New York City is the capital of New York” (false–it’s Albany).

If only we had a lookup database of true facts, then we could train a machine to detect when something isn’t true. OpenAI’s creators stated that one of the reasons ChatGPT doesn’t differentiate between what is true and false is because it didn’t use a “source of truth.” That’s because no such all-encompassing Compendium of True Things exists. Conversely, no Compendium of False Things exists either. (The latter would have to be infinitely larger.)

The closest things to such imaginary compendiums would be existing knowledge graphs, which are data structures that store, link, and capture relationships between entities. Wikipedia, Google, and social media networks use such knowledge graphs to surface results and group similar things together. But even these extensive knowledge graphs are mostly limited to discrete concepts or entities.

Further, discrete, easily provable facts make up a surprisingly small portion of all written text. Take, for example, the title of this article: “AI struggles to detect false information because finding truth is a word problem, not a math problem” Is that sentence true or false? Well, it’s an argument. It is a hypothesis that I am currently using facts, analysis, and reasoning to support. Other people could find counterarguments. So let’s say that sentence is mostly true, or generally true, or more true than false. It’s not as true as “George Washington was the first US President.”

A good portion of text written in what we consider “news” is actually predictions about the future. Again, these are hard to classify as true or false; some predictions are better supported than others, and therefore some occur to us as being more true than others, even though technically no prediction is true at the time it is made.

In other words, not every piece of text is fact-checkable, or classifiable into true or false. It’s impossible to label all text that way. That’s why, at Ad Fontes, when we rate a piece of content—even a single sentence—for veracity, we rate it on a scale of 1-5 (with 1 being “true”, 5 being “false”, and 2-4 a continuum between) on either a “provability” or a “certainty” basis.

We use the concept of “provability” to rate certain statements because statements can be true, mostly true, neither true nor false, mostly false, and false based on how easily provable and widely accepted the proof is. This approach is helpful to classify content from provable statements to arguments and opinions. Opinions (3, on the 1-5 scale) are usually statements for which you can find proof to support or refute the statement, depending on what facts you rely on. They are ultimately neither provable nor disprovable.

We use the concept of “certainty” to rate some claims because sometimes things are definitely true or definitely false, but our current level of certainty can vary. This approach is helpful to classify claims where not all the necessary information is knowable at a given time.

You can label a piece of text within a taxonomy. That is, you can give it a score or a categorical designation like our 1-5. As a result, you can train a machine learning model (i.e., use math) to predict whether similar texts would be true, false, or something in between. But to know why the text scored that way would require background reasoning (i.e., the answer you would write to a word problem).

2. Things change. (Yikes!)

We can never be 100% confident about what’s true, what’s true can change over time, and what’s true is subject to our collective acceptance thereof.

It is uncomfortable to admit that “what is true” is squishy. Philosophers have wrestled with this since the beginning of, well, philosophy.

The most efficient example of how hard it is to 100% certain of anything is in The Matrix. How do we know for certain that we aren’t floating in a vat and this isn’t all a computer simulation? Well, we can’t know that 100%—really, we can’t know anything 100%. But all our senses tell us it’s quite likely that we have real lives to live, and so we need to make decisions and act accordingly. We act as if we are quite certain, though not 100%, that our lives are real. It would be to our detriment to do nothing on the minuscule chance we’re just human batteries in a vat. There’s always a chance we’re wrong, even when we are very certain.

In arguments about truth, people often say “there used to be a time when everyone thought the world was flat” to illustrate the points that 1) what’s known as true can change in light of new information, and 2) truth is a function of our collective agreement. I agree with both of these statements.

When we say something is “true,” what we really mean is that something is “true, based on the best information we have available at the moment.” When people thought the world was flat, that was a reasonable interpretation of the information they had at the time. As better information became available, a new truth became available. What’s true, then, is inherently temporal (i.e., time-based).

This makes it difficult to label text for truth because even if you label something as true at a given point in time, that label may need to be updated as new information becomes available. When humans update their mental models with and change their beliefs, they replace the old information with new information. An LLM, if trained on things that used to be considered true but are no longer, would have to be programmed to replace the old information with new information as well.

During this world-is-flat/world-is-round truth shift, there were times where 1) pretty much everyone believed it was flat, 2) some people thought it was round while others still thought it was flat (not all the information got around to everyone at the same time), and 3) pretty much everyone believed it was round.

What we call true is a function of how widely accepted that truth is. Even if there were an ultimate, objective truth, given our limitations as humans we can’t claim to know what that is. The closest we can get to thinking something is objectively true is when we have extremely broad subjective consensus.

Surprisingly few things have as much worldwide human consensus as “the earth is round,” and yet flat-earthers still exist. They are few enough in number that “the earth is round” is a robust example of a “true thing,” something as true as George Washington being the first President or Albany being the capital of New York.

But think about things that are just a little less widely accepted. “The earth’s climate is getting warmer.” “It is getting warmer because of human activity.” “We need to take action to reduce carbon emissions to reduce the warming of the climate.”

In terms of what percentage of people accept each statement as true, the relationship is:

“The earth is round” > “The earth’s climate is getting warmer,” > “It is getting warmer because of human activity,” > “We need to take action to reduce carbon emissions to reduce the warming of the climate.”

For the sake of argument, let’s assign estimates (loosely based on similar questions from recent related polls) to what percentage of all people (experts and laypeople alike) in the United States believe each statement is true.

“The earth is round:” 98%
“The earth’s climate is getting warmer” 72%
“It is getting warmer because of human activity” 57%
“We need to take action to reduce carbon emissions to reduce the warming of the climate.” 52%

If we were to label each of those statements on a scale of 1-5 for veracity based on the idea that consensus is a component of truth, we might label each of those statements on our scale as follows:

“The earth is round:” 1 (True)
“The earth’s climate is getting warmer” 2 (True and mostly certain)
“It is getting warmer because of human activity” 3 (Uncertain whether true or false)
“We need to take action to reduce carbon emissions to reduce the warming of the climate.” 3 (Uncertain whether true or false).

But wait! Shouldn’t we put more weight on the consensus of people who are most knowledgeable about climate change? Like climate scientists? That’s a decision that would need to be made by a labeler of a data set for truth.

In our work, the decision we’ve made is “sometimes.”

On certain subjects, the consensus of experts in a given field has more weight than that of the general population. For any given subject, there are always people who are in a position to know something better than others (a concept we’ll explore in a minute). However, labelers need to decide for which subjects and questions deference should be given to experts. In this example, the first three statements are more strictly science questions than the fourth, which is more of a policy question with considerations beyond science (like geopolitics and economics).

Given that scientific consensus is much higher than layperson consensus on each of the first three statements, we would rate them as 1 (True and highly certain). Since it’s not as clear that only the consensus of the scientists should be considered on the fourth statement, it is probably best to leave that as a 3 (Uncertain whether true or false). Notice that you might disagree with those ratings. Such is the nature of labeling statements for truth. Reasonable people can disagree, and the labelers (such as Ad Fontes) can and will get things wrong.

Now think about repeating the above exercise on statements about COVID origins, vaccines, abortion, affirmative action, gun violence, etc.

Labeling for truth requires making many decisions on what background information you will use to inform such labels. Any labeler undertaking these kinds of decisions will probably want to try to mitigate their own biases by involving people with different political perspectives and personal characteristics.

The way we do it is that when we rate an article for reliability and bias, we do so with a panel of three analysts—one center, one left, and one right. We have a roster of 60 analysts who are diverse by political leaning as well as age, race, and gender. We do extensive methodology training as well as governance, transparency, and inter-rater reliability procedures, each of which we seek to improve over time.

Because concepts of truth change, our analysts constantly take in new information. A big part of our training is about updating our beliefs based on that new information and being ok with ambiguity and change. This is no small feat. You may know people in your life who absolutely refuse to update their beliefs based on new information.

3. Misinformation is usually about disputed and/or new information.

When was the last time you heard a fake news story claim that Abraham Lincoln was the first President, or that New York City was the capital of New York? Never. That’s because most misinformation involves constructing a story around information that is new and/or disputed. Ordinarily, it is also about explicitly political topics and directed at sharply polarized audiences, where misinformation tends to find broader acceptance.

New information is often the subject of most uncertainty. Questions like “what happened with the voting machine error in Antrim County, Michigan?” “What happened to Damar Hamlin?” and “Are the air and water safe around the derailed train in Ohio?” are ripe for exploitation by those who want to spread misleading or inaccurate information. This is because for a period of time after the initial event occurs, not all the information is knowable: it only becomes available over time.

Disputed or unknowable information, like the origin of COVID-19, is also the subject of misleading and false stories. In the cases of both new and disputed information, there is a void of facts that can be classified as definitively true or false. Bad actors can easily step into this void and fill it with false things people might want to hear.

A model trained on what was written in the past cannot determine whether a new piece of information, never before written about, is true. It will also struggle to judge disputed information. This is a problem predictive AI has faced for years in content moderation.

Existing generative AI systems, including ChatGPT, based on GPT 3.5, are trained on text written before 2022, so it would also inherently struggle with determining whether newer information or anything happening in real time is true or false.

Some people imagine that if a generative AI for text could also get trained on and search the internet, this limitation would be diminished. Microsoft and Google may be well on that path, given that they are incorporating generative AI technologies into their search engines.. However, new information, true, false, and in-between, gets put on the internet continuously, any labeling of a dataset for truth would require similarly continuous updating.

Determining the likelihood of veracity of a new piece of information has its own challenges, but they are not insurmountable. Humans do this with new information all the time. Those that are adept at distinguishing what information is likely true know how to assign credibility to differing sources of information.

In each situation, there are certain people who are closest to—and in the best position to know—the truth about something. In Antrim County, Michigan, on election night in 2020, it was the people at the county clerk’s office in Antrim and the reporters who talked to them directly, and less so podcast commentators in the subsequent weeks. In Damar Hamlin’s medical emergency, it was his medical staff, NFL officials, and ESPN reporters rather than speculators on Twitter. In the train derailment in Ohio, it was hazmat first responders, local, state, and federal officials, local residents, and local reporters in the city. Less so talk radio and cable news hosts.

These examples illustrate that the best source of information on a given new topic varies greatly from situation to situation, and has to be identified in real-time by labelers each time.

4. All the information on the internet isn’t all the information everywhere.

Because the internet contains so much information, we often forget that it doesn’t contain all information. It just contains all the information that has been written, documented, and uploaded onto the internet. The information text-generating AI has been trained on is only the text version of such information. Information in video or audio adds two other dimensions; anyone comparing a text transcript to its corresponding audio or video understands that tone, sarcasm, humor, and emotions get lost in transcription. A machine trained on transcripts inherently misses tacit information that humans glean from video and audio content.

Information that isn’t on the internet yet still has to be observed by someone and recorded because machines cannot currently observe, investigate, and transcribe most newly occurring events. Fortunately there is an existing profession whose members already do this on a regular basis: journalism. Journalists who are closest to new events are often the best sources of functionally true information regarding such events. Non-journalist observers, like bloggers and other writers, also contribute new information to the internet. Human writers are like an Application Programming Interface (API) between the physical world and the internet. It is only from there that the generative AI machine can crawl and do its best to answer your questions.

There’s also a vast amount of human knowledge that exists offline, some of it virtually impossible to digitize, like how a scent makes you feel, or why you take a risk based on your intuition, or how you can tell that a person you’re talking to is having an off day.
Knowing what’s true also requires knowing what’s missing, which is another sort of thing we don’t realize we know, but which we naturally use to judge the veracity or bias of a given argument.

All these types of information that humans know beyond written text make up what we call context. To tell what’s true, we need context, not just content. Context is largely informed by observing the physical world, which machines cannot always do. Many machines (e.g., IoT devices, wearables) can and do observe the physical world and upload information to the internet, but they don’t observe all the same things humans do.

Can we solve this problem? We should try.

Can AI be developed to identify false and misleading information? I think it can be developed to a point where it is much better than it currently is. Getting there will require supervised learning regarding truth, which would require large datasets of text labeled for degrees of truth.

How big would a labeled data set of statements need to be to achieve meaningful levels of improvement? Likely it would require a few million labeled sentences and paragraphs in each language and region for which a text-generative AI is implemented. It’s probably like the self-driving car AI problem, where millions of pieces of labeled data might even get the technology to 97-99%+ accuracy compared to humans, but it will never get it to “perfect.” That’s OK, because human drivers aren’t perfect either: the goal is for self-driving cars to be, ahem, more perfect than human-driving cars. Humans aren’t perfect at detecting truth either, but we’d like our machines to be a lot closer to human-level accuracy than they are now.

Getting machines to maintain high accuracy would also require significant and ongoing human-in-the-loop supervised learning with the labeling of new information as it gets put online.

I believe it is possible, and also necessary, for us to develop much better predictive AI to detect false and misleading information. The advances in generative AI make it much easier to create false and misleading content at scale, so the need to detect it at scale will continue to grow. This is why my company has undertaken such labeling of content for degrees of truth, despite the difficulty of the work.

Skeptics and critics often ask us “who made you the arbiter of truth?” Nobody. Just like nobody made Facebook and Twitter the biggest distributors and amplifiers of information, and nobody made OpenAI the biggest new generator of information. Each company just started building things. To the extent they work, they get adopted, and others are free to criticize or jump in with alternative solutions.

Labeling for truth is hard, and solutions for doing it are—and will continue to be—imperfect. But just because a new solution is hard to build and destined to be imperfect doesn’t mean we shouldn’t try to build it.

As ChatGPT has shown us, imperfect technology can still be useful. But with such transformational technologies, we have a collective responsibility to mitigate its imperfections over time so their benefits outweigh their drawbacks.



Vanessa Otero

Vanessa is a former patent attorney in the Denver, Colorado area with a B.A. in English from UCLA and a J.D. from the University of Denver. She is the original creator of the Media Bias Chart (October 2016), and founded Ad Fontes Media in February of 2018 to fulfill the need revealed by the popularity of the chart — the need for a map to help people navigate the complex media landscape, and for comprehensive content analysis of media sources themselves. Vanessa regularly speaks on the topic of media bias and polarization to a variety of audiences.


Join over 30,000 others and stay informed on the latest sources added to the Media Bias Chart (and more) by joining our email list!