Researchers at MIT have developed a computational system to predict failures like the one that hit Southwest Airlines in 2022, turning previous research into a diagnostic tool for real-world systems.
Automation is increasingly being used in critical systems such as air traffic scheduling and autonomous vehicles. However, when these systems fail, the consequences can be severe. Researchers at MIT have developed a computational system to predict failures like the one that hit Southwest Airlines in 2022.
Air traffic scheduling involves planning and coordinating flight routes, departure times, and arrival times to minimize delays and maximize airport capacity.
This complex process relies on advanced computer systems, including air traffic management software and radar systems.
According to the International Air Transport Association (IATA), efficient air traffic scheduling can reduce fuel consumption by up to 10% and lower greenhouse gas emissions by 5%.
Airlines and airports worldwide invest heavily in optimizing their schedules to ensure smooth operations and meet growing passenger demands.
The motivation behind this work is frustration with complicated systems where it’s hard to understand what’s going on behind the scenes. The goal of this project was to turn previous research into a diagnostic tool for real-world systems.
Previous research has focused on failure prediction problems, such as robots working together or complex systems like the power grid. However, these problems are different from predicting rare failures in critical systems. In these systems, automated decision-making interacts with the physical world, leading to complexity and uncertainty.
There is no public data available about aircraft reserves throughout the Southwest network. The researchers had to infer what was behind the decisions using sparse publicly available information – just flight arrival and departure times.
The researchers developed a model of how the scheduling system is supposed to work, then ran it backwards to see what initial conditions could have produced the observed outcomes. They used extensive data on typical operations to teach the computational model what is feasible, possible, and physically possible.

The way aircraft reserves were deployed was a ‘leading indicator‘ of the problems that cascaded in a nationwide crisis. The data showed that Denver‘s reserves were rapidly dwindling due to weather delays, but this also led to failures in other areas of the network.
Aircraft reserves refer to the amount of fuel, 'essential resources' , and other essential resources stored on board an aircraft for emergency situations.
This reserve is typically calculated based on factors such as flight duration, weather conditions, and passenger capacity.
The International Civil Aviation Organization (ICAO) sets guidelines for minimum reserve requirements to ensure safe operations.
For example, a commercial airliner may carry an additional 10-20% of fuel beyond the estimated requirement to account for unexpected delays or changes in weather.
This research could lead to a real-time monitoring system where data on normal operations is constantly compared to current data. This could allow for preemptive measures, such as redeploying reserve aircraft in advance of anticipated problems.
The researchers have produced an open-source tool called CalNF for analyzing failure systems. This tool is available for anyone to use and will aid in the development of more robust critical systems.
Predicting rare kinds of failures is crucial for ensuring the safety and reliability of critical systems. The research presented here demonstrates a method for doing so using a combination of sparse data on rare events and extensive data on normal operations.
Failure prediction is a method used to forecast the likelihood of equipment, system, or process failure.
It involves analyzing historical data and performance metrics to identify potential weaknesses and areas for improvement.
Techniques such as regression analysis, decision trees, and neural networks are commonly employed in failure prediction models.
These models can help organizations reduce downtime, minimize costs, and enhance overall reliability.
According to a study by the International Society of Automation, '70% of equipment failures are due to human error or inadequate maintenance.'