As AI agents become increasingly autonomous, new questions arise about responsibility for errors and financial damage. Can companies truly absolve themselves when their agents cause harm?
Who’s to Blame When AI Agents Screw Up?
As Google and Microsoft push agentic AI systems, the kinks are still being worked on how agents interact with each other—and intersect with the law. ‘Veteran software engineer Jay Prakash Thakur has spent his nights and weekends prototyping AI agents that could order meals and engineer mobile apps almost entirely on their own.’
Agents are AI programs that can act mostly independently, allowing companies to automate tasks such as answering customer questions or paying invoices. While ChatGPT and similar chatbots can draft emails or analyze bills upon request, ‘Microsoft and other tech giants expect that agents will tackle more complex functions—and most importantly, do it with little human oversight.’
Artificial intelligence (AI) has its roots in ancient Greece, where myths described robots and automatons.
However, the modern concept of AI emerged in the mid-20th century with the development of computer science and cognitive psychology.
The first AI program, Logical Theorist, was created in 1956 by Allen Newell and Herbert Simon.
Since then, AI has progressed rapidly, driven by advancements in machine learning, natural language processing, and deep learning.
Today, AI is integrated into various industries, including healthcare, finance, and transportation.
The tech industry’s most ambitious plans involve multi-agent systems, with dozens of agents someday teaming up to replace entire workforces. For companies, the benefit is clear: saving on time and labor costs. Already, demand for the technology is rising. Tech market researcher Gartner estimates that agentic AI will resolve 80 percent of common customer service queries by 2029.
However, as Thakur has developed his own agents, he has also exposed new legal questions that await companies trying to capitalize on Silicon Valley’s hottest new technology. The question of who bears responsibility when their errors cause financial damage has been Thakur’s biggest concern. Assigning blame when agents from different companies miscommunicate within a single, large system could become contentious.
Benjamin Softness, an attorney who recently left Google to join law firm King & Spalding, said that aggrieved parties tend to go after those with the deepest pockets. That means companies will need to be prepared to take some responsibility when agents cause harm—even when a kid messing around with an agent might be to blame.
The insurance industry has begun rolling out coverage for AI chatbot issues to help companies cover the costs of mishaps. Thakur’s experiments have involved him stringing together agents in systems that require as little human intervention as possible.

One project he pursued was replacing fellow software developers with two agents. One was trained to search for specialized tools needed for making apps, and the other summarized their usage policies. In the future, a third agent could use the identified tools and follow the summarized policies to develop an entirely new app.
When Thakur put his prototype to the test, a search agent found a tool that supported unlimited requests per minute for enterprise users. But in trying to distill the key information, the summarization agent dropped the crucial qualification. It erroneously told the coding agent that it could write a program that made unlimited requests to the outside service.
These errors highlight the need for more robust testing and validation processes when developing agentic AI systems. Thakur also pursued a more complicated project: an ordering system for a futuristic restaurant that could accept custom orders across cuisines. Users could type out their desires—to burgers and fries—to a chatbot. An AI agent could then research an appropriate price and translate the order into a recipe.
However, errors tended to appear most often when Thakur tried to jam through orders with more than five items. A worst-case scenario would be misserving someone with a food allergy. Even single-agent systems can go wrong, as Naveen Chatlapalli, a software developer helping companies with agents, has seen an HR agent approve leave requests it should have denied.
A leading hope among developers is that a “judge” agent can start to reign over these systems and identify and remedy errors before they snowball. However, this raises concerns about the need for more robust testing and validation processes. Companies may be tempted to overengineer their early systems with an unnecessary number of agents, no different than bloating inside a human bureaucracy.
In reality, users can’t kick up their feet and leave it all to the agents just yet. Legal experts have suggested that people who wish to use agentic systems sign contracts that push responsibility onto the companies supplying the technology. However, ordinary consumers can’t force giant companies to agree to these terms.
There will be interesting questions about whether agents can bypass privacy policies and terms of service on behalf of users. Rebecca Jacobs, associate general counsel at Anthropic, has said that there will be challenges in identifying and addressing these issues. Dazza Greenwood, an attorney who has been researching the legal risks of agents, encourages caution.
“If you have a 10 percent error rate with ‘add onions,’ that to me is nowhere near release,” he says. “Work your systems out so that you’re not inflicting harm on people to start with.”
- wired.com | Who’s to Blame When AI Agents Screw Up?