AI Biases in Critical Foreign Policy Decisions

Remote Visualization

Foreign policy has always been a contest of intelligence—human and, now, artificial. The integration of AI into strategic decisionmaking is not a distant future but a present reality, reshaping how states assess risks, build alliances, and respond to crises. As governments experiment with AI agents, the challenge is clear: ensuring these models are not just capable assistants but mission aligned and tuned for generating answers to core questions at the heart of strategy and statecraft.

To that end, CSIS Futures Lab is proud to share the findings from the first major benchmarking study of how large-language models (LLM) approach international relations and foreign policy decisionmaking. Benchmarking is a form of evaluation that provides critical insights into the strengths and limitations of foundation models like ChatGPT, Gemini, and Llama. With respect to strategy and statecraft, it explores how these models approach decisionmaking related to great power competition, managing alliances, and building coalitions to confront complex transnational challenges like migration and climate change. The findings are available as an interactive dashboard and a longer, technical paper. Below is a summary of initial insights and their policy implications.

What We Found: All Models Are Wrong, But Some Are Useful

AI models are increasingly integrated into national security applications like the Department of State’s StateChat and the Department of Defense’s NIPRGPT and CamoGPT. Recently, OpenAI announced that ChatGPT Gov is for broad use across government agencies. While these models function as capable digital assistants able to summarize data and generate text, our benchmarking study reveals that they exhibit biases in critical foreign policy decisionmaking domains producing a need for further refinement. As the famous saying goes, no model fully captures reality—even the most advanced AI remains an approximation, with its utility dependent on practical application. In the realm of foreign policy, this means identifying and mitigating biases and errors that could distort strategic analysis and misguide national security leaders working alongside AI agents. With careful instruction and fine-tuning approach, these biases could be mitigated as, at the end of the day, LLMs are what we make of them.

Escalation: The Risk of AI Bias in Crisis Scenarios

Our study tested AI models against 400 scenarios and over 60,000 question-and-answer pairs designed by international relations scholars. The results reveal a concerning trend: some widely used AI models exhibit a marked bias toward escalation in crisis scenarios compared to others.

This finding has serious strategic implications. If AI models systematically favor escalation, they could skew policy analysis toward more aggressive responses in conflict-prone situations, increasing the risk of miscalculation in high-stakes geopolitical environments. Without continuous evaluation and refinement, AI agents could reinforce escalation-prone tendencies in scenarios where restraint and strategic ambiguity are often the preferred options.

Additionally, biases around escalation appear to be state-specific. All language models are more likely to recommend that the United States, United Kingdom, and France escalate their actions during a crisis compared to Russia or China. Although this reflects the underlying training data, it also opens an area of academic research for understanding the reasoning dynamics of these models.

To address this challenge, policymakers and analysts must engage in continuous model refinement while ensuring that human users set clear contextual parameters when querying AI agents. Investments in benchmarking, evaluation, and supervised fine-tuning are critical, as is training national security professionals to ask precise, logically structured questions that account for AI limitations.

Cooperation: Diplomatic Bias in AI Models

Across all foundation models tested AI agents demonstrated a strong preference for cooperative approaches in international relations, particularly with respect to the foreign policy of the United States and the United Kingdom. This suggests a latent bias toward diplomacy and alliance-building, likely rooted in the frequency with which Western-led international institutions are discussed in historical training data.

While this bias aligns with past international norms, it does not necessarily reflect the strategic realities of twenty-first-century geopolitics. As great power competition intensifies, states often pursue hedging strategies, selective engagement, or even coercive diplomacy—factors that current AI models may overlook.

This finding underscores the importance of contextual awareness in AI-assisted strategy. Leadership changes, major geopolitical events, and shifting national interests can rapidly alter the character of global politics—something AI models, by design, struggle to anticipate. This reinforces the need to train national security leaders, military planners, and intelligence analysts in AI literacy, ensuring they understand both the strengths and limitations of AI-generated insights.

Recommendations to Practitioners

CSIS Futures Lab is a major proponent of embracing data science, AI, and alternative analysis to revolutionize strategy and statecraft. Our researchers believe the future lies in learning how to work alongside AI agents. Based on the results of our benchmarking study, the following recommendations emerge.

First, to maximize AI’s potential in national security and foreign policy, governments must expand AI benchmarking and model evaluation. Routine testing and auditing of AI models should be standard practice to detect biases and refine performance. Public-private partnerships should work toward establishing standardized evaluation frameworks—like the CSIS Futures Lab’s Critical Foreign Policy Decisions (CFPD) Benchmark—to compare how different LLMs perform in strategic contexts. Defense agencies, policymakers, diplomats, and AI developers can use CFPD Benchmark to evaluate AI models before deployment, ensuring they align with strategic objectives and minimize unintended risks. CSIS Futures Lab will be publishing a series on benchmarking methodology.

Second, the United States must invest in AI literacy for national security professionals. Policymakers, diplomats, and military planners must receive specialized training to understand how AI models operate, recognize their biases, and frame queries effectively. AI should be a tool for augmenting, rather than replacing human judgment, ensuring that decision-makers remain critical interpreters of AI-generated insights.

Third, transparency and customization must also become priorities in developing AI agents to work in national security. AI developers should provide clearer explanations of how models are trained and how they generate decisions, allowing policymakers to use them more effectively. National security agencies should work toward tailoring AI models to align with specific strategic needs and geopolitical realities rather than relying on generic foundation models that may not reflect nuanced policy objectives.

Finally, interdisciplinary research is essential for integrating AI into strategy and statecraft responsibly. In addition to investing in the computing power required to expand the use of AI, the U.S. government should prioritize funding research initiatives that bridge the gap between AI development, international relations, and strategic studies. By fostering collaboration between technologists and security professionals, academics and policymakers can ensure that AI tools are designed and refined with national security priorities in mind.

Yasir Atalan is a data fellow in the Futures Lab at the Center for Strategic and International Studies (CSIS) in Washington, D.C. Ian Reynolds is a fellow (non-resident) in the Futures Lab at CSIS. Benjamin Jensen is a senior fellow in the Futures Lab at CSIS.

Image
Yasir Atalan
Data Fellow, Futures Lab, Defense and Security Department
Image
Ian Reynolds
Postdoctoral Futures Fellow (Non-resident), Futures Lab, International Security Program
Image
Benjamin Jensen
Director, Futures Lab, and Senior Fellow, Defense and Security Department