It Is Time to Democratize Wargaming Using Generative AI

The role of artificial intelligence (AI) in strategic decisionmaking is still evolving. There are concerns about ethics, escalation dynamics, testing and evaluation standards, and how best to align people and models in military planning. However, the role generative AI and large language models (LLMs) in wargames and strategic analysis often remain overlooked in these discussions.

If more people—from academics and concerned citizens to military professionals and civil servants—gained access to generative AI tools and understand how to integrate them into analytical wargames, the result will be a more diverse set of ideas and debates guiding foreign policy.

Wargaming Today: Central to Strategy but Costly and Opaque

A UK Ministry of Defence manual defines wargames as “structured but intellectually liberating safe-to-fail environments to help explore what works (winning/succeeding) and what does not (losing/failing), typically at relatively low cost.” Games are laboratories for decisionmaking, helping practitioners evaluate tradeoffs associated with everything from tactical choices to force design. Seen in this light, games have a long history in this context and often sit at the intersection of policy research and social science.

From the interwar period and Cold War to contemporary debates about countering Russia and China, wargames have been a staple of strategic analysis in the United States. These simulation-driven exercises evaluate theories, assumptions, and strategies related to warfare through the development of hypothetical conflict scenarios. As a result, wargames serve multiple purposes within policy circles. They facilitate dialogue across agencies and among stakeholders, fostering an environment where new ideas can emerge and analysts can evaluate key assumptions. This process is instrumental in shaping and informing policymaking decisions as it helps raise awareness across the policy circles. In fact, games often serve as a private forum for refining strategy and a vehicle for raising public awareness of these issues.

Whether classified or unclassified, wargames are a form of synthetic data. They are based on scenarios, which even when backed by extensive research and sensitive intelligence, are still approximations of reality. Games cannot predict the future—but then again, neither can most analysts. What games can do is highlight tradeoffs and provide a forum for analyzing decisionmaking. They can also play a key role in analyzing “tail risk” and low-probability, high-consequence events. This dynamic makes wargaming and red teaming related components of strategic analysis.

Unfortunately, modern wargames run by and for the U.S. government tend to be expensive, opaque, and prone to hyperbole. There is no clear, transparent accounting for the costs associated with running an analytical wargame, with costs for typical games ranging from hundreds of thousands to millions of dollars. If that picture is not bleak enough, according to a 2023 Government Accountability Office (GAO) study, there “are barriers to accessing wargame data, information on upcoming wargames is not shared, and the services have not developed standard education and qualifications for wargamers.” This is why academics like Jacquelyn Schneider at the Hoover Institute have moved to create a game repository and argue for more transparency with respect to design methodology and funding sources for the public at large. It is not always clear who funds some wargames—both inside and outside government—casting doubt on the objectivity of the findings.

The Future of Wargaming: Cheaper to Produce and with Replication Standards

Incorporating AI into wargames can both reduce the traditional costs associated with running games and increase opportunities for more rigorous analysis of strategy and decisionmaking. 

From Players to Role Emulation

Analysts can train models using fine-tuned datasets to represent different stakeholders. Games pivot on the quality of the players, but the best players are often overbooked and on the move. It is costly to fly people around the world for a short game (i.e., one to three days) and beltway insiders have demanding schedules. If it’s hard to get a general officer or member of the National Security Council to take a day to play a game at a think tank in Washington, D.C., imagine how hard it is getting a Chinese citizen or Russian nationals connected to the government.

Therefore, rather than directly relying on human players sitting around a table to play a game, the twenty-first-century analyst can use generative AI and LLMs to create game agents. Recent studies indicate that synthetic data can effectively mirror the response patterns of a diverse array of human subpopulations, which can be useful in drawing predictive conclusions for specific sides. Using synthetic data from wargames to generate action can transform how human players see problems by pushing them toward divergent viewpoints and debate.

Imagine a new type of wargame where a collection of decisionmakers—human players—interacts with AI-generated role players, similar to most modern video games. For example, every student in professional military education could replicate planning with a coalition partner and fight against an adversarial AI, replicating enemy doctrine and even strategic culture. These games would be cheaper to facilitate, essentially trading the costs of travel, honorariums, and grumpy consultants playing the enemy with labor costs to collect and curate data used to train LLMs. In addition, the games would be shorter, allowing the design team to play multiple games and collect more data about decisionmaking in lieu of a single expensive game that is often too big to fail.

From a Rigid Road to War to Alternative Scenarios

Another costly aspect of wargaming—and one often prone to sampling bias—is the “worlding” used to create the starting conditions of the game. With the aim of creating alternative “worlds,” research teams spend countless hours engaged in confirmation bias, selecting the worst terrain and scariest approximations of opposing forces to fight against. This process is costly and often creates an inadvertent pull toward the worst-case scenario. All roads lead to war, and this tendency skews how players make decisions.

The problem is that if one starts with the wrong story, the conversation becomes limited. These starting stories—the underlying scenario—along with player roles (or characters) and the choices they are asked to make (or the plot) are central to wargaming. For example, if one starts a crisis game with the Chinese Communist Party committed to war and a fully mobilized military, and a U.S. military out of position and a political class prone to division, the initial conditions of the game create path-dependent choices and flawed observations. China always wins. The United States always loses. Worse still, capabilities are treated as offsets and technological deus ex machinas radically transform the story.

Using AI, game designers can create multiple words using a mix of generative images and text at low costs. Researchers can tailor datasets, and similar to creating synthetic players, write not one, but a series of scenarios mapping different roads to war (i.e., different initial conditions). These variable starting conditions better capture how sensitive complex systems are to initial conditions. The variation becomes a critical component for analyzing decisionmaking, especially if it diverges across the treatments creating a larger possibility space for assessing strategy. For the cost of one traditional wargame, the analyst can run multiple games and see what combinations of ends, ways and means produced the best advantage based on different scenario assumptions.

Furthermore, lowering the cost of creating images that go beyond the typical PowerPoint slides, bad computer graphics, and text-heavy game packets common in national security gaming could lead to more robust insights. It turns out that art actually stimulates brain functions. At present, most games that add stylized graphics and multimedia tend to break the bank. Generative AI can change that through the use of programs like Midjourney and DALL*E3.

From No Standards to Replication Guidelines

As previously noted, most games are thin on methods making it difficult to replicate even the most basic insights on decision making. The typical game report will detail the scenario, player roles, and objectives often leaving out a more extensive literature review and methods discussion. In lieu of methods, the reader sees the game rules often followed by a list of action, reaction, counteraction narratives by turn. Even this tacit approach to a wargame report, according to the GAO, lacks formal standards across the institution that is the largest funder of wargames on the planet: the U.S. Department of Defense.

Technology alone will not overcome a failed analytical process. Rather, future wargames built using generative AI should adhere to a set of best practices linked to what the broader scientific community refers to as replication standards. Replication involves both creating insights and interpreting outcomes. In fact, using rigorous replication standards helps improve the quality of research findings, essentially adding checks and balances to the process. Applied to games, it means not just laying out the rules but also abstracting a logical sequence that illuminates the how and why or decisionmaking in the context of adversary reactions (i.e., feedback loops) and imperfect information (i.e., uncertainty).

This logical sequence is captured by inventorying prompts and structured data labeling. Going forward it can also build in red teaming techniques, showing how and when the logic of a particular set of decisions begins to break down based on synthetic data and player interaction. In other words, the human is always in the loop, not just designing but also stress testing their game and using the results to analyze decisionmaking.

Conclusion: Would You Like to Play a Game?

Existing, generative AI offerings offer viable ways for reducing the costs and increasing rigor in analytical wargaming. The only barriers to entry are the human imagination and the willingness of a legacy defense bureaucracy to consider alternative approaches to strategic analysis.

To that end, the Department of Defense needs to accelerate its support for efforts like TF LIMA—the new generative AI task force—and experiments like the Global Information Dominance Exercise. More importantly, services need to start funding copilots and other unclassified AI tests at lower echelons while studying how best to train military professionals to work with—not against—models that aggregate data. In all likelihood, this movement will require significant changes to professional military education to include practicums on data science, statistics, research methods, and red teaming.

Benjamin Jensen is a senior fellow in the Futures Lab at the Center for Strategy and International Studies (CSIS) in Washington, D.C. Yasir Atalan is an associate data fellow in the Futures Lab at CSIS. Dan Tadross is head of federal delivery at Scale AI.

Image
Benjamin Jensen
Senior Fellow, Futures Lab, Defense and Security Department
Image
Yasir Atalan
Data Fellow, Futures Lab, Defense and Security Department
Image
Dan Tadross

Dan Tadross

Head of Federal Delivery, Scale AI