As AI fact-checking tools become increasingly popular, a growing concern surrounds their accuracy and reliability. Recent studies have found significant shortcomings in the ability of generative AI chatbots to accurately convey news reporting, raising questions about the credibility of these tools in a digital age.
The Unreliability of AI Fact Checks
Understanding the Limitations of Generative AI Chatbots
The use of generative artificial intelligence (AI) chatbots for fact-checking is on the rise. However, these chatbots are not always reliable and can produce inaccurate or misleading information. A recent survey found that 27% of Americans have used AI tools such as OpenAI’s ChatGPT, Meta’s Meta AI, Google’s Gemini, Microsoft’s Copilot, or apps like Perplexity instead of traditional search engines.
Artificial intelligence (AI) is increasingly being used to enhance fact-checking processes.
AI algorithms can analyze vast amounts of data, identify biases, and detect inconsistencies in a matter of seconds.
According to a study by the Poynter Institute, 'AI-powered fact-checking tools have improved accuracy rates by up to 30%'.
Additionally, AI can help reduce the workload of human fact-checkers, allowing them to focus on more complex and nuanced issues.
The Problem with AI Fact Checks
In the face of Grok‘s recent statements about ‘white genocide‘ in South Africa, many people have asked themselves how accurate and reliable the chatbots’ responses are. The discussion around this topic arose after the Trump administration brought white South Africans to the United States as refugees. AIAI blamed an ‘unauthorized modification’ for Grok’s obsession with the ‘white genocide‘ topic, claiming it had conducted a thorough investigation.
Grok is a concept introduced by science fiction author Robert A. Heinlein in his 1961 novel 'Stranger in a Strange Land'.
It refers to the ability to understand and empathize with others, seeing things from their perspective.
The term has since been adopted in various contexts, including psychology and philosophy, to describe a deep understanding of human behavior and emotions.
Grokking involves putting oneself in another's shoes and comprehending their thoughts, feelings, and motivations without judgment.
However, two studies by the BBC and the Tow Center for Digital Journalism found significant shortcomings in the ability of generative AI chatbots to accurately convey news reporting. In February, the BBC study found that 51% of the chatbots’ answers had ‘significant issues of some form.‘ Nineteen percent of answers were found to have added their own factual errors, while 13% of quotes were either altered or not present at all in cited articles.

The ‘Diet’ Problem
AI chatbots are only as good as their ‘diet,’ which refers to the sources they are trained and programmed on. This can lead to flaws like the recent pollution of Large Language Models (LLMs) by Russian disinformation and propaganda. Tommaso Canetta, deputy director of the Italian fact-checking project Pagella Politica, said that if the sources are not trustworthy and qualitative, the answers will most likely be of the same kind.
Russian disinformation refers to the dissemination of false or misleading information by Russia, often through state-sponsored media outlets and social media platforms.
This tactic has been used to influence public opinion, shape narratives, and undermine trust in institutions.
According to a report by the Stanford Internet Observatory, 95% of Russian propaganda on Twitter originated from just 1,000 accounts.
The most common topics targeted include politics, elections, and social issues.
Fact-checking organizations have found that up to 70% of online claims about Ukraine can be traced back to Russian disinformation efforts.
AI Chatbots’ Limited Capabilities
AI chatbots exhibit severe limitations when it comes to identifying AI-generated images. In a quick experiment, DW asked Grok to identify the date, location, and origin of an AI-generated image taken from a TikTok video. Grok claimed that the image showed several different incidents at several different locations, ranging from a small airfield in Salisbury in England to Denver International Airport in Colorado to Tan Son Nhat International Airport in Ho Chi Minh City, Vietnam.
However, the image was actually generated with AI and showed none of the mentioned locations. DW strongly believes it was generated by artificial intelligence, which Grok seemed unable to recognize despite clear errors and inconsistencies in the image.
Expert Advice
Felix Simon, postdoctoral research fellow in AI and digital news at the Oxford Internet Institute (OII), concludes that ‘AI systems such as Grok, Meta AI or ChatGPT should not be seen as fact-checking tools. While they can be used to that end with some success, it is unclear how well and consistently they perform at this task, especially for edge cases.’
For Canetta at Pagella Politica, AI chatbots can be useful for very simple fact checks. However, he advises people not to trust them entirely and to always double-check responses with other sources.