UK AI Summit Debrief Part 2: The State of International AI Safety Governance

In a landmark effort to shape global artificial intelligence (AI) safety governance, on November 1 and 2, the UK government convened the first international AI Safety Summit. This initiative marks a significant step towards collaborative international efforts in AI governance and safety. Part 1 of this debrief explained how the summit framed AI safety. Part 2 charts the path forward for AI safety governance after the summit.

The primary output of the summit is the Bletchley Declaration, a statement of intent for global AI safety governance. There are five main principles represented in the declaration:

  1. Ensuring human-centric, trustworthy, and responsible AI
  2. Adopting regulatory frameworks that balance risks with benefits
  3. Collaborating across countries and sectors to help strengthen AI diffusion in developing countries
  4. Calling on frontier AI developers to commit to safety testing, evaluations, transparency and accountability
  5. Identifying safety risks of shared concern

In organizing the summit, the UK government envisioned its role as carving out a distinct dialogue that could transcend commercial or strategic competition. Positive reviews of the summit suggest the government achieved both of those roles in the short term. China’s inclusion in the summit is a boon to international AI safety, and their participation in the Bletchley Declaration establishes a common language with which to discuss shared interests.

In the long term, it is difficult to see the United Kingdom diverging enough from the United States to be able to play the role of an independent arbiter in the global AI governance landscape. The two allies have broad alignment on technology and security policy, deep research ties, and common commercial interest in AI as hosts of the vast majority of leading labs. While it would be advantageous to both the United Kingdom and the United States to see the former emerge as the preeminent international convener on AI Safety issues, the European Union and China could challenge that role given the extensive overlap of American and British interests.

More immediately, the mantle will now pass to South Korea and France.

The South Korea and France Safety Summits

Along with the United Kingdom, South Korea will cohost a “mini virtual summit” sometime in the Spring of 2024. The event is intended to monitor progress on the principles in the Bletchley Declaration, feeding into the subsequent summit in France in the fall. South Korean science and information and communication technology (ICT) minister Lee Jong-ho delivered a speech on the first day of the summit, calling for the establishment of an international AI organization under the United Nations. It will be interesting to see if there is any convergence over the next six months on the question of whether a new AI international institution is necessary.

In fall 2023 France will host the second in-person AI Safety Summit. France is an intriguing AI governance actor that has so far influenced international governance primarily through its membership in the European Union and G7. However, there are three reasons to anticipate a larger role for France that is independent of those two institutions.

First, France is one of the only countries with a potentially world-leading AI company. Mistral AI, founded by alumni from Google DeepMind and Meta, raised $113 million at a $260 million valuation in its first month of existence. Mistral AI is attempting to differentiate itself from the other labs at that level of capitalization by focusing on smaller, more efficient training methods, open-sourcing its models, and primarily focusing on serving businesses rather than consumers. Second, with the incorporation of Mistral AI, Europe now has a technology national champion, which has been an elusive goal for the French and German governments in recent years. Third, France has been assertive in supporting domestic AI development, with public seed funding available alongside private co-funding as part of the National Strategy for AI. Open-source AI has been a core element of that strategy.

France and South Korea are both robust defenders of global norms and the rules-based international order, making them credible stewards of international AI governance. Just as the United Kingdom had considerable license to develop an agenda consistent with national AI goals, South Korea and France will soon have the opportunity to do the same.

The U.S. and UK AI Safety Institutes

One of the major tangible outcomes of the summit is the launch of two AI safety institutes, one in the United States and another in the United Kingdom. On November 1, Vice President Kamala Harris announced in a speech at the U.S. Embassy in London the establishment of the United States AI Safety Institute (US AISI) housed within the National Institute of Standards and Technologies (NIST). The US AISI is tasked with converting the NIST AI Risk Management Framework into tangible evaluation and risk mitigation steps, ultimately developing technical guidance that could directly support federal rulemaking in every agency.

One day later, the United Kingdom announced its own AI Safety Institute (UK AISI) with a white paper outlining its mission and structure. The UK AISI’s mission is, in some respects, bolder: the white paper declares the institute will “advance the world’s knowledge of AI safety by carefully examining, evaluating, and testing new types of AI.” Crucially, seven leading AI labs based in France, the United Kingdom and the United States committed to pre-deployment testing of the next generation of models at the UK AISI. The work of the UK AISI will be public information, intended to be shared globally among friends and competitors alike.

The US AISI and UK AISI will perform different—and complementary—functions. While the US AISI is focused on policy implementation, the UK AISI is intended to be a global knowledge repository and exchange focused on the research frontier. The UK AISI will have priority access to state-of-the-art computing resources to facilitate specialized research. For its part, the US AISI also intends to collaborate with international peers to advance the stock of AI risk knowledge, specifically namechecking the UK AISI. It is not the institute’s raison d’être, as is the case with its British peer.

The G7 Hiroshima Process

On October 30, 2023, the other of the most prominent international AI governance dialogues—the G7 Hiroshima AI Process—provided an update on its efforts, consisting of three components:

  1. G7 Leaders’ Statement
  2. International Guiding Principles for Organizations Developing Advanced AI Systems
  3. International Code of Conduct for Organizations Developing Advanced AI Systems

The Code of Conduct contains the most substance on AI governance. It consists of voluntary guidance for developing advanced AI systems, including foundation models and generative AI. Priorities for international governance include collaborating on advancing AI safety, security, and trustworthiness; blocking access to advanced AI systems for hostile non-state actors; and encouraging investment in robust security controls and responsible information sharing. There are also points of divergence. As a collection of liberal democracies, the G7 puts emphasis on shared values such as rule of law, human rights, diversity, fairness, and democracy. It is somewhat prescriptive on actions to take to mitigate risk, ensure system trustworthiness, and maintain accountability, calling for measures such as diversity and representation in the AI life cycle, identifying and reporting vulnerabilities, and increasing the public understanding of AI systems' capabilities and limitations.

One policymaker familiar with the various international AI dialogues expressed their belief that there likely will not be binding international rules on AI anytime soon. Instead, the purpose of the Code of Conduct is to establish basic principles and common language for discussing more ambitious governance proposals in the future, starting with likeminded allies and working outward to eventually include a broader group of countries.

Next Questions for AI Safety

The outlook for AI safety at the international level asks critical questions of domestic regulation. Now that the international community is coalescing around a set of shared principles, governments will throw themselves more forcefully into the hard work of turning those principles into practice. In 2024, governments will seek to answer three main questions through regulatory action.

The Limits of Technical Solutions to AI Problems

Technical AI safety solutions have great but unrealized potential. These solutions would be baked into the model so that no amount of adversarial prompting could enable unsafe output. An example from elsewhere in technology is smartphones’ biometric locks, requiring a fingerprint or facial recognition to open the phone. Barring the kind of sophisticated hacking that is the province of nation state actors, it is technically impossible to open a phone without the correct fingerprint or face scan. Due to the mathematics underpinning large language models (LLMs), that kind of technical solution is proving to be difficult to attain in the model training stage of development. Instead, most efforts to ensure safety have focused on post-training reinforcement learning with human feedback.

Civil society organizations are deeply skeptical of technical-only solutions. High-profile failures of machine learning engineers to anticipate and mitigate harmful model outputs have convinced many AI policy stakeholders that social scientists need a co-equal role in safety. Leading AI policy documents—such as the NIST AI Risk Management Framework, the G7 Hiroshima Process Code of Conduct, and the U.S. Executive Order on AI—enshrine the role for non-technical personnel in AI safety. While there have been promising developments in technical solutions to AI safety, such as provenance for AI generated images, it is clear that non-technical personnel will play an important role AI safety processes.

A contingent factor is the idea that LLMs are simply a passing phase en route to a more explainable AI paradigm that could also be safer. Sam Altman has expressed his belief that the era of “bigger is better” in AI models is nearing its end. The Executive Order on AI seemingly reflects the view of industry that smaller models are inherently safer, by exempting models below a certain training compute threshold from reporting requirements. A trend towards more specialized models could also enhance safety. Google DeepMind has made remarkable contributions to scientific research with highly specialized models that are difficult to deploy outside of well-considered contexts. (Those models have their own dangers, such as being able to formulate dangerous pathogens, so must be kept out of the hands of bad actors).

Earlier generations of autonomy were more interpretable than generative AI’s neural networks, which suffer from the black box problem more than knowledge-based systems, such as airplane autopilots or tax software. There is also some tension between neural networks and econometrics in the AI research field. The econometrics-led approach to AI dominated neural networks until only recently, in large part because it is easy to directly connect certain undesirable outputs with specific parameters in those models.

The Prospects for AI Safety Legislation

There is broad consensus (with a few notable outliers) that the themes of the UK AI Safety Summit deserved their place in the agenda. One question to consider for the future is the extent to which new legislation is necessary to address those themes. Similar approaches in the United States and United Kingdom are leading to shared realizations about the wisdom of legislation at this stage of AI development. Many regulators in both countries assert they do not need new powers from their respective legislatures in order to effectively police risk in their respective domains.

Gill Whitehead, Group Director for Online Safety at the Office of Communications (Ofcom), the regulatory and competition authority for the ICT industry, acknowledged during the London AI safety dialogues a future role for legislation in filling regulatory gaps, but also asserted that they had the power and capabilities necessary to regulate AI in their respective domain, citing Ofcom’s diverse team, governance independence, evidence-based policymaking structures, and explicit enforcement powers. Referring to the United Kingdom’s Online Safety Act, Whitehead said, “a lot of these [AI] use cases are in scope,” including deepfakes, generative AI in internet search, and algorithmic recommender systems. “The enforcement powers that we have are significant,” such as fines as a share of global turnover, senior management liability, and powers to disrupt services. CSIS has previously written about the U.S. AI Executive Order on AI, and the belief in federal agencies that their existing authorities are sufficient in many cases to regulate AI today.

Nonlegal Forces Governing AI Safety

Nonlegal forces could play a critical role in AI safety. Lawrence Lessig’s theory of technology regulation sees laws as just one of four modalities of regulation, with the other three being coding architecture, market incentives, and norms.

Contrary to the lack of evidence of coding architecture as a regulatory force, market incentives to deliver safe AI appear strong. Prior to ChatGPT’s success, there were numerous examples of racist, misogynist, and generally terrible chatbots. As offensive as their output may have been, it also rendered them useless. It could be argued that one of ChatGPT’s primary distinguishing factors leading to its leading share of the chatbot market is its relatively strong record on producing safe output—a product of major investment into interdisciplinary safety evaluation of its underlying models. Businesses and consumers want AI safety, and people do not pay for technology that diverges from what they want.

The last modality—norms—has great potential to enforce AI safety. Much of the character of the early Internet can be explained by Silicon Valley norms rather than laws. Norms can be particularly powerful, often having the practical enforcement effect of laws. When news came to light that UnitedHealth Group had used algorithms to pressure its staff to terminate healthcare coverage for seriously patients in order to increase profits, the backlash was swift. At the time of publication, UnitedHealth was the subject of a class action lawsuit with an unknown outcome. However, the damage to its reputation was already done. Social media posts were abound with condemnations of UnitedHealth’s conduct, blasted as shameful, grotesque, and the most deplorable use of AI in healthcare. Regardless of the outcome of the lawsuit, healthcare companies will likely think twice before similar deployments in the future, even if they have a positive impact on their bottom line.

Michael Frank is a senior fellow in the Wadhwani Center for AI and Advanced Technologies at the Center for Strategic and International Studies in Washington, D.C.

Image
Michael Frank
Senior Fellow, Wadhwani Center for AI and Advanced Technologies