The AI-driven development revolution may be overhyped, as a recent study reveals that Cognition’s Devin, touted as the ‘first AI software engineer,’ has a dismal success rate of just 15% in automating complex tasks.
The Flawed Promise of AI: Devin‘s Struggle as the “First AI Software Engineer”
Researchers at Answer.AI recently spent a month with Cognition‘s Devin, an AI software engineer that has been touted as a revolutionary tool for automating complex tasks. However, their findings are far from impressive.
Devin’s Performance: A Mixed Bag of Failure and Inconclusiveness
Out of 20 tasks attempted by the researchers, Devin failed to deliver in 14 instances, while producing inconclusive results three times. The AI assistant managed to succeed only thrice, resulting in a paltry success rate of just 15 percent.
Lack of Predictability and Efficiency
What’s even more concerning is that the team found it impossible to predict which tasks would yield positive results. Even when similar tasks were attempted earlier, Devin would often fail in complex and time-consuming ways. The AI’s autonomous nature, initially seen as a promising feature, became a liability, causing it to spend days pursuing unfeasible solutions.

A Glimpse into Devin‘s Work Process
When tasked with deploying multiple applications on the Railway platform, Devin failed to realize that this was not possible. Instead, it continued to attempt the task and produced inaccurate information about interacting with Railway. This highlights the AI’s fundamental problem of struggling with complex tasks.
The Hype vs. Reality Gap
Cognition AI has been making bold claims about Devin’s capabilities since its introduction in March 2024. However, the recent analysis by Answer.AI reveals that the tech still grapples with basic problems. The industry’s tendency to exaggerate AI capabilities is a pressing concern, especially when companies like Meta and OpenAI are planning to integrate AI into their operations.
The Uncertain Future of AI in Software Development
As AI technology continues to advance, it remains uncertain whether Devin or similar tools will be able to replace human software engineers effectively. The Answer.AI team’s findings serve as a reminder that the road to AI adoption is paved with challenges and uncertainties.