Asking AI How to Respond to a Foreign Policy Decision
Remote Visualization
AI models have built-in biases that factor into their decisionmaking. These biases should be mitigated before integrating them into foreign policy processes.
- CSIS Futures Lab research indicates that some widely used models (e.g., Llama 8B, Gemini 1.5, and Qwen2) choose escalatory responses in the benchmark study, compared to models like Claude, GPT, Llama 70B, and Mistral, that chose a decrease in conflict intensity. These discrepancies likely stem from differences in training data and fine-tuning practices.
All eight large language models (LLMs) recommend more escalatory responses for the United States, United Kingdom, and France, while offering fewer recommendations for escalation to China and Russia.
To safeguard decisionmaking, governments and agencies must invest in comprehensive evaluation frameworks and institute routine audits of AI models. Adopting tools like Futures Lab’s CFPD-Benchmark can help identify and correct these biases before deployment—ensuring that AI supports strategic objectives while minimizing unintended risks.
Related Content
Image

Postdoctoral Futures Fellow (Non-resident), Futures Lab, International Security Program