DeepSeek’s chatbot achieves 17% accuracy, trails Western rivals in NewsGuard audit

 

Chinese AI Chatbot DeepSeek Scores Low in Accuracy, Raising Concerns

Jan 29 (Reuters) – Chinese AI startup DeepSeek’s chatbot has struggled in accuracy, achieving just 17% in delivering reliable news and information, according to a recent audit by NewsGuard. The evaluation ranked DeepSeek tenth out of eleven AI models, trailing behind Western competitors such as OpenAI’s ChatGPT and Google’s Gemini.

The report, published by trust-rating service NewsGuard on Wednesday, revealed that DeepSeek’s chatbot repeated false claims 30% of the time and provided vague or unhelpful responses in 53% of cases. This resulted in an overall fail rate of 83%—far worse than the 62% average failure rate of its Western counterparts.

Performance vs. Expectations

DeepSeek’s lackluster performance raises doubts about its ambitious claims of competing with Microsoft-backed OpenAI at a fraction of the cost. Despite its technical shortcomings, the chatbot quickly became the most downloaded app in Apple’s (AAPL.O) App Store upon release. However, its rapid rise also intensified concerns over the U.S.’s competitive edge in AI, contributing to a market rout that erased nearly $1 trillion from U.S. tech stocks.

The Chinese startup has yet to respond to requests for comment.

Bias and Inconsistencies

NewsGuard’s audit subjected DeepSeek to the same 300 prompts used to evaluate other AI models, including 30 prompts based on false claims circulating online. Topics included last month’s killing of UnitedHealthcare executive Brian Thompson and the downing of Azerbaijan Airlines flight 8243.

In a striking discovery, DeepSeek inserted Chinese government viewpoints into responses without any China-related context. For instance, when asked about the Azerbaijan Airlines crash—an event unrelated to China—the chatbot still echoed Beijing’s stance, raising concerns about potential bias.

The Trade-Off: Cost vs. Accuracy

Despite its shortcomings, DeepSeek’s appeal lies in its cost efficiency. D.A. Davidson analyst Gil Luria remarked, “The significance of DeepSeek’s breakthrough isn’t in its accuracy for Chinese news-related questions but in its ability to answer any question at just 1/30th of the cost of similar AI models.”

Like many AI systems, DeepSeek struggled most when responding to users attempting to generate and spread misinformation, NewsGuard noted. This vulnerability underscores the broader challenge of ensuring AI-driven news accuracy in an era of rapid technological advancement.