The Open Foundation Model Debate: A Litmus Test

Emerging civil society debates over AI safety—especially over open foundation models—merit particular attention. Unlike with closed models like GPT-4, developers of open foundation models like Llama, Mistral, or Qwen openly publish the models’ underlying parameters (“weights”), allowing them to be inspected, modified, and operated by end users.10 With the performance of open models approaching their closed counterparts (Figure 2), some have suggested that open model distribution could pose “extreme risks” for misuse.11 Others, meanwhile, have highlighted open models’ benefits for research, security, and national competitiveness.12 Though outcomes remain uncertain, proposals to limit the distribution of open models—such as through California Senate Bill (SB) 1047—have recently gained legislative traction.13

How the open foundation model debate is resolved would have direct implications for the defense industrial base. As detailed in later sections, there are preliminary reasons to believe that a diverse open model ecosystem might benefit the DOD. The widespread availability of high-performance, open-source foundation models could improve the DOD’s ability to (1) competitively source and sustain AI systems, (2) deploy AI securely, and (3) address novel use cases. Considering these impacts, the open model debate represents a test case for how civil society evaluates defense priorities in AI policy decisions.

Outlining these implications might also clarify, in White House Office of Science and Technology Policy director Arati Prabhakar’s words, an often “garbled conversation about the implications, including safety implications, of AI technology.”14 Indeed, in its flagship report on the subject, the Biden administration suggested that “the government should not restrict the wide availability of model weights” but that “extrapolation based on current capabilities and limitations is too difficult to conclude whether open foundation models, overall, pose more marginal risks than benefits.”15 The administration has not endorsed open model restrictions nor foreclosed future regulation. An accounting of defense industrial benefits might therefore contribute to this ongoing conversation.

Terms of the Debate

Open-source software and standards are already widespread in U.S. national security applications.16 Army smartphones, Navy warships, and Space Force missile-warning satellites run on Linux-derived operating systems.17 AI-powered F-16s run on open-source orchestration frameworks like Kubernetes, which is regularly updated, maintained, and tested by industry and the broader public.18 Open-source software is ubiquitous, permeating over 96 percent of civil and military codebases, and will remain a core piece of defense infrastructure for years to come.19

What constitutes an “open” foundation model is less well defined. Developers can distribute foundation models at different levels of “openness”—from publishing white papers and basic technical information to releasing models entirely, including their underlying weights, training data, and the code used to run them.20 By contrast, developers of closed models, including GPT-4 or Claude, release fewer details or data, only allowing user access through proprietary application programming interfaces.21 In general, this brief defines “open” models as those with widely available weights, consistent with relevant categories in the 2023 AI executive order.22 Many of the risks and benefits discussed here flow from these definitions.

Claims of extraordinary risk have motivated several recent proposals surrounding open-source AI. Analysts have expressed concern that malicious users might modify open foundation models to discover cybersecurity vulnerabilities or instruct users in the creation of chemical and biological weapons.23 Others have argued that public distribution of model weights could aid adversaries in advancing their AI capabilities.24 Given these apprehensions, some observers have proposed export controls, licensing rules, and liability regimes that would limit the distribution of open foundation models.25

A competing school of thought has emphasized the societal benefits of open foundation models.26 Open distribution of weights, some argue, accelerates innovation and adoption: indeed, the key frameworks and innovations underpinning today’s large language models (LLMs), like PyTorch and the transformer architecture itself, were distributed openly.27 Others contend that the public scrutiny of model weights enables rapid discovery and repair of vulnerabilities, improves public transparency, and reduces the concentration of political and economic power as AI systems increase in importance.28

What is most clear, however, is that this risk-benefit assessment remains incomplete. The U.S. Department of Commerce’s initial assessment is inconclusive, and AI safety literature has thus far lacked clear frameworks for identifying relative risk and benefit and whether they are unique to open models.29 Despite concerns over AI models instructing untrained users in biological weapon development, for instance, recent red-teaming exercises concluded that LLM-equipped teams performed similarly to those without.30 Similar concerns over AI-assisted cyber vulnerability discovery remain unclear, with some arguing that enhanced vulnerability detection may benefit cyber defenders over attackers, or that the balance of advantage would be case-dependent.31 Malicious use, meanwhile, continues to take place with closed models.32 In brief, more research remains necessary to unpack where the relative risks and benefits lie.33 The purportedly catastrophic harms of tomorrow’s foundation models have not yet come into clear view.34

Second, the pace of technical change has been so uncertain that evaluating future benefits, harms, and policy interventions can be challenging.35 Whether a licensing regime is effective, for example, depends on how readily foundation model technologies will diffuse.36 And whether export controls benefit national security hinges on which analogy becomes relevant: Is restricting open models like restricting nuclear weapons exports, or is it akin to Cold War bans (now repealed) on public-key cryptography, a technology which now underpins online banking, e-commerce, and a multi-trillion-dollar digital economy?37 In the absence of a U.S. market presence, will Chinese open models take their place?38

Finally, questions remain on how to implement AI policy. Definitional challenges abound; early AI policy approaches, including the EU AI Act, AI executive order, and California SB 1047, apply thresholds for “systemic risk” to models exceeding a certain amount of computing power or cost used in their development.39 However, such thresholds for triggering government review, such as the 1026 floating-point-operation threshold in the AI executive order, may incompletely capture the capabilities they aim to regulate.40 How to balance resourcing for AI policy implementation against other cyber and biological threat mitigations, such as for material monitoring or new cyberdefense capabilities, remains another open question.41