What happens when AI plays war?

Wargames are helping answer one of the biggest questions of the AI era: how machines might reshape human decision-making in war.

Wargames are simulations that help military strategists explore how a crisis might play out before it actually happens
These games have shown how quickly small escalations can spiral into catastrophic conflict, especially when fear, stress, and uncertainty shape decision-making.
AI is reshaping modern wargaming, allowing designers to scale simulations, replicate subject-matter expertise, and explore how the technology could influence — or even replace — human judgment in military conflicts.

This article is part of Big Think’s monthly issue The Power of Play.

China has taken over the South China Sea. Taiwan’s military and geopolitical future is in disarray. The United States’ ships struggle to enter the region, while gray-zone cyberattacks exacerbate tensions across Southeast and East Asia. China and the U.S. stand at odds, two axes of power placed in direct geopolitical conflict. Targeted missile strikes destroy military vessels and commercial ships alike, placing supply chains at risk of collapse.

This situation isn’t real, but it easily could be, which is why analysts have designed wargames that simulate similar geopolitical crises. More than just modeling the logistics of battle, these games can help defense experts explore the most uncertain variable in warfare: how people behave in moments of crisis.

“Wargames aren’t good for telling you how much fuel 200 drones take,” Andrew Olson, a wargame designer and researcher at CNA, a not-for-profit organization that works with branches of the U.S. military, says. “[They’re] good for understanding: If I gave you 200 drones, how might other players react?”

Today, they also need to think about artificial intelligence, which is being integrated into both warfare and strategic planning at a rapid pace. That means wargame designers now face a trickier question than how an enemy might use drones and troops: What happens when the people making decisions in a crisis are guided — or replaced — by machines?

The human element

Wargaming is far from new — in the 19th century, the Prussian Army used Kriegsspiel, a system of maps and color-coded units designed to simulate battlefield decision-making. During the Cold War, when the possibility of nuclear war cast a dark cloud over the future, wargames became a way for defense experts to assess the likely outcome of a nuclear conflict.

In the 1950s, the RAND Corporation had analysts from two divisions — mathematics and social sciences — develop and conduct what came to be known as the “Cold War Games.” While the analysts in the mathematics department focused on quantitative variables (abstract payoffs, probabilities, and strategic incentives), the social scientists incorporated emotional variables (sustained pressure, risk, fear, and reward) into their simulations.

The two approaches played out very differently. The mathematicians’ simulations escalated quickly, with nuclear weapons deployed early. In the social scientists’ wargames, players did not launch nuclear weapons at all, and rather than becoming more aggressive over time, they became more cautious. This suggested that military conflicts couldn’t be understood through strategy alone — human behavior under pressure mattered, too.

The final message between the two sides in the Pentagon’s wargame: “May you burn in hell like you are going to burn here.”

In 1983, the Pentagon developed its own classified nuclear wargame, Proud Prophet, and it revealed what can happen when the human variable in military conflict is pushed to its limits.

Players were chosen from within the U.S. military and intelligence agencies and placed on opposite sides of the Cold War divide. They hesitated to strike at first, but that initial caution didn’t last. After small-scale nuclear artillery rounds were launched in response to Soviet pressure near West Germany, restraint all but collapsed.

By the seventh day of strikes in the wargame, the map of Europe was covered with nuclear fallout, as illustrated in the New York Times reporting of the aftermath. Limited tactical strikes led to the deployment of “theater” nuclear weapons. Over the hotline between the U.S. and Moscow, communication faltered, on the brink of collapse. The final message between sides was downright vitriolic. “May you burn in hell like you are going to burn here,” a high-ranking U.S. official allegedly wrote to his counterpart. No one wrote back.

Many of the players re-evaluated their views on escalation as a viable tactic in the game’s aftermath.

High-level U.S. military officials were horrified. Few, if any, strategists had anticipated the severity of the simulated crisis. In the aftermath, escalation suddenly became a very tangible risk to many senior commanding officers. The game had gone off the rails in the span of a week. Who was to say the same wouldn’t — or couldn’t — happen in real life?

Proud Prophet offered military officials a sobering glimpse of how quickly a nuclear standoff could evolve into unimaginable chaos. Maybe it was the firsthand experience of how easily emotion and error could enter the fray, or the shock that tiny actions, added up, could lead to such a catastrophic result, but many of the players re-evaluated their views on escalation as a viable tactic in the game’s aftermath.

Artificial military intelligence

It’s now been 35 years since the end of the Cold War, and while nuclear weapons are still very much a concern for militaries, a new technology has entered the battlefield: artificial intelligence. We’ve already seen AI pattern recognition and targeting systems integrated into military operations, and Palantir Technologies has reportedly developed AI decision-support tools for the U.S. military.

Researchers and designers are increasingly integrating AI into their wargames, too. AI can be used to simulate more scenarios than humans could play out alone, increasing the amount of data available for analysis. AI advisors trained on niche regional knowledge can act as subject-matter experts during traditional wargames, too, potentially reducing the cost and time it takes to run them.

What happens when human judgment in military conflicts is shaped — or replaced — by machines?

However, assembling a group of highly trained experts — senior officials, logistics specialists, supply chain analysts, academics, intelligence officers, and military strategists — is a foundational part of wargaming. The interactions between these experts can reveal critical details about how a conflict might unfold — a supply chain expert might weigh in on trade routes, for example, while a major with deep regional knowledge might correct false assumptions based on their expertise.

But because we still don’t know the exact trade-offs between human experts and AI advisors, it’s not clear whether this is actually an improvement over traditional, human-only wargames. “There are both known and unknown challenges … and we do not yet have the same level of understanding of currently available LLM systems as we do of subject-matter experts,” Olson says.

The newest variable

More than being just a tool within wargames, AI is also a battlefield variable that wargame designers need to consider. How might the technology be used in battle? And, perhaps even more importantly, given what we know about the impact of the human element on escalation, what happens when human judgment in military conflicts is shaped — or replaced — by machines?

“Defence ministries, intelligence agencies, and foreign policy establishments worldwide are already exploring how AI might augment human judgment in crisis decision-making,” writes Kenneth Payne, a researcher at King’s College London. “Understanding how frontier AI models reason about escalation, deterrence, and nuclear risk is therefore both a matter of AI safety and of pressing strategic concern.”

In February, Payne tested how three AI models — GPT-5.2, Claude’s Sonnet 4, and Gemini 3 Flash — might act if given decision-making power in a nuclear crisis. A clear pattern of behavior emerged: In 95% of scenarios, the models chose mutual nuclear signaling — actions by both sides meant to communicate a willingness to escalate, such as missile tests — as a viable tactic. In another study, this one at the Stanford Institute for Human-Centered AI, a base version of GPT-4 rationalized a nuclear strike with characteristically upbeat encouragement. “A lot of countries have nuclear weapons,” it explained. “We have it! Let’s use it.”

“The point is to figure out what is going to break — before it breaks in real life.” - Andrew Olson

Olson pointed out that these types of wargames aren’t particularly helpful for predicting what real nations would do in a crisis: “A game played by all AI agents tells a lot about the assumptions of those agents, but that isn’t answering the question: Would nations act like that?”

However, research has repeatedly shown that humans are likely to trust an AI’s output as a result of its confidence and cadence, regardless of its degree of factual accuracy. Would an AI suggesting they launch a strike encourage a military official to act more aggressively than they might otherwise?

Just as nuclear weapons radically changed the landscape of war in the 20th century, AI is changing what it looks like today, and no one is entirely sure how it’s going to play out: Will militaries allow AI to make combat decisions? Only use it to inform human-made decisions? If so, to what extent? By giving experts a chance to rehearse the potential angles, wargames can help ensure defense experts aren’t caught completely off guard. As Olson says, “The point is to figure out what is going to break — before it breaks in real life.”

What happens when AI plays war?

Loading please wait...