Six Questions Every DOD AI and Autonomy Program Manager Needs to Be Prepared to Answer

Available Downloads

Introduction

In November 2017, former deputy secretary of defense Bob Work gave his first public speech after leaving the DOD. He told the assembled audience that “AI [artificial intelligence] and autonomy [are] going to be the pacing technological competition between great powers in the twenty-first century . . . it is time for the United States and the Department of Defense to cowboy up.” [1]

Roughly seven months earlier, while still serving as deputy secretary, Work signed a memo launching what would become the DOD’s most visible AI program: the Algorithmic Warfare Cross-Functional Team, better known as Project Maven. Project Maven was originally focused on a narrow task, using machine learning AI to partially automate object detection and classification in drone video surveillance feeds. However, the DOD’s ambitions for Project Maven were far from modest. Lieutenant General Jack Shanahan, who oversaw Project Maven through its first year, said that “Maven is designed to be that pilot project, that pathfinder, that spark that kindles the flame front of artificial intelligence across the rest of the [Defense] Department.”

It has now been six years since the launch of Project Maven. Have the fires of AI and autonomous systems spread across the Department of Defense?

One clue toward an answer comes from a July 2022 news article that tells the story of an army unit directly supported by a pair of AI software engineers. The engineers created an AI program to help front-line soldiers find camouflaged enemy targets using AI computer vision during a real-life war. The software development team coordinated directly with soldiers on the ground, rapidly iterated and improved the AI technology based on user feedback and delivered a capability that the soldiers loved. One might think that Work’s vision of a military powered by AI has come true.

There’s a twist, though. The army unit in the news article is Ukrainian, and the advanced AI system they are using is not one provided by the U.S. military or even U.S. companies, but one developed by a team of only two volunteer Ukrainian software engineers. Perhaps most shocking, these two engineers developed and fielded the AI system—delivering real value to real warfighters in a real war—in a matter of weeks. DOD software development, even for efforts that do not involve AI, routinely takes years.

For his part, Work is frustrated with the pace of the DOD’s overall progress. In 2021, while serving as the vice chair of the National Security Commission on AI, Work stated at a press conference: “We have not organized ourselves to win the [AI] competition, we do not have a strategy to win the competition, we do not have the resources to implement a strategy, even if we had one.” Work is far from alone in terms of former DOD officials expressing frustration at the pace of the department’s AI transformation.

A great deal has been accomplished in the DOD’s effort to accelerate the adoption of AI since the launch of Project Maven. However, the department still significantly lags behind commercial industry, which continues to deliver remarkable breakthroughs in AI and autonomous systems technology, most recently in generative AI.

This paper, the first in a series of two, builds on months of CSIS research and a private roundtable with experts, as well as on dozens of interviews, many of which were conducted on a not-for-attribution basis, with current and former members of the DOD, allied countries, and members of the private sector involved in AI and autonomous technologies.

The paper examines DOD AI and autonomy adoption as viewed through the eyes of a DOD AI program manager and the challenges they face. In doing so, it provides fresh insights into the question of why Ukrainian volunteers can field impactful military AI software in weeks, while many parts of the DOD often struggle to do so even with far more time and resources.

The aim of this first paper is to honestly and candidly explore the challenges that face any DOD organization or leader who is seeking to solve a problem with AI. The second paper will adapt lessons learned from DOD AI and autonomy efforts of the past six years and make recommendations to policymakers and DOD leaders for how to reduce some of these barriers and accelerate technology adoption.

For any high-quality and impactful AI-enabled capability to be developed and deployed to users in the DOD, six critical inputs must come together effectively: mission, data, computing infrastructure, technical talent, end-user feedback, and budget.

Because potential AI use cases are incredibly diverse—nearly as diverse as the uses of traditional software—this paper does not provide a checklist. Instead, it examines the six critical inputs listed above and provides key questions that DOD AI and autonomy program managers ought to be prepared to answer as they embark upon pursuing an AI solution to a given problem. Some of these challenges relate to AI technology generally and apply to both commercial and government adoption of AI. However, many of the most significant AI adoption challenges facing the DOD relate to bureaucratic and structural constraints that are specific to the DOD and the government context. This paper argues that without clear and viable answers to these questions, even the most promising and well-resourced DOD AI efforts are likely to encounter significant—perhaps insurmountable—barriers to success.

The questions are as follows:

  • Question 1: Mission

    What problem are you trying to solve, and why is AI the right solution? The DOD should never pursue AI for its own sake. Any proposed AI development effort should have a credible path to making impact on a problem that matters for warfighting, for DOD enterprise effectiveness, or for both.
  • Question 2: Data

    How are you going to get enough of the right kind of data to develop and operate your AI system? Data is the raw material for modern AI systems that use machine learning. Without a frequently updated training dataset that closely resembles data from the operational environment, the real-world performance of AI systems will be poor.
  • Question 3: Computing Infrastructure and Network Access

    How will you get the AI system approved to reside on and interact with all of the DOD networks required for its development and use? Modern AI has unique computing requirements in terms of both software and hardware, and AI models need to be constantly retrained and redeployed in order for them to maintain adequate performance. The computing infrastructure in which the AI and autonomous capabilities are developed must be tightly linked to the operational network infrastructure and end-user systems on which they will be deployed.
  • Question 4: Technical Talent

    How are you going to attract enough of the right kind of AI talent and put that talent to good use? Demand for AI expertise continues to outstrip supply in the highly compensated commercial sector, and the situation is even more challenging in the DOD and defense industrial base. The DOD should be able to not only recruit AI experts but also ensure that their skills are put to good use once they join.
  • Question 5: End-User Feedback

    How are you going to ensure that the operational user community is a frequent source of feedback and insight during the development process? High-quality AI capabilities are always developed iteratively, and it is unrealistic to expect that initial system requirements in either the joint requirements process or the contract documents will comprehensively capture everything that end users will require, without the need for future revisions. End users, whether at a combatant command or a DOD agency, must be directly and frequently engaged throughout the AI capability development process.
  • Question 6: Budgeting and Transition

    Are the diverse DOD development, procurement, and operational organizations involved with your AI capability adequately budgeting for their involvement in the program? The DOD budgeting system is slow and cumbersome, and it typically splits responsibility for funding different phases of a given system’s lifecycle among different organizations and operational communities. Even if an organization succeeds in developing an AI-enabled capability beloved by end users, it will likely struggle to secure a place in the DOD budget for scaling and operating that capability over time beyond the development and testing phase. DOD AI program managers should not only argue for the resources they need to manage their own efforts but also secure buy-in from transition partners, who should also adequately budget for the resources required to scale and adopt AI technology.

As previously mentioned, this paper is the first in a pair, and the second paper will focus on impactful solutions that the DOD should pursue to rapidly accelerate the adoption of AI and autonomous systems for mission impact.

Question 1: Mission

What problem are you trying to solve, and Why is AI the right solution?

While it is absolutely the case that technological superiority is one of the key foundations of U.S. military strength, the widespread acceptance of this truth makes it easy to forget that not all kinds of technological progress matter for overall strategic objectives.

Consider the following simple thought experiment: A car factory can produce 10 car engines per month and 100 wheels per month. If making a car requires one engine and four wheels, how many cars can the factory produce per month?

The answer is 10, as engines are the rate-limiting factor (or “bottleneck) of production.

Now, assume the factory installs a new AI-enabled system that triples monthly wheel production to 300. How many cars can the factory produce per month?

Still 10, because engines, not wheels, are the production bottleneck, and the new AI system did nothing to improve engine production.

What is true in manufacturing is also true in warfare: it’s the productivity improvements at the overall process bottleneck that count. If the battle is likely to be lost because air base runways and fuel tanks are being destroyed by long-range missiles—a common concern in recent RAND wargames carried out on behalf of the Air Force—then even successful technology improvements that make fighter aircraft more stealthy or more lethal in the air may do little to impact battlefield outcomes, as these are determined by vulnerabilities on the ground.

The thought experiment above offers a simple insight for DOD leaders who are interested in potentially adopting AI. Namely, leaders should ensure that the problem they are working on is actually the bottleneck in their overall mission efficacy. When approached from this frame, the problem will essentially never be “we need to adopt AI,” since an AI-enabled system is a possible means to an end—not the end itself. As one DOD leader told CSIS in an interview: “If you build an AI model without a practical use in mind, you may as well be building an AI model with no uses at all.”

Once a program manager has a clear sense of what the problem is, they can begin exploring whether or not an AI-enabled capability deserves to be part of the solution. This requires a realistic understanding of what modern AI can and cannot do, along with what factors have to be in place for an AI system to deliver a desired level of performance. For example, after AlphaGo (an AI system for playing the strategy board game Go) defeated the world champion in 2016, there was excitement in some U.S. and Chinese military circles that an “AI commander” with superhuman strategic thinking might be on the near horizon. This mistaken line of thinking resulted from a failure to appreciate what the convenient properties of Go are from an AI engineering perspective and why AI’s success in Go—a genuine research breakthrough for the field of AI—would not necessarily generalize to other strategic planning applications in the near term. Among the many, many differences between Go and warfare, a digital simulator for the game Go is mathematically identical to the real board game Go—meaning that nearly infinite, high-quality training data can be quickly and cheaply generated by having the system play against itself. Using today’s AI technology, an AI military strategy system that plays against itself in a warfare simulation is as likely to learn how to exploit the limitations of the simulation as it is to learn useful military tactics.

Military simulations have many beneficial uses for training humans, and advanced simulations are also frequently part of developing advanced AI systems, such as autonomous driving.[2] Nevertheless, non-expert readers of news about AI research must be cautious when drawing conclusions. Moreover, AI is not a discrete item but a general-purpose technology, analogous to electricity or computers. The breadth of potential AI use cases is nearly as broad as that of traditional software, which underpins capabilities as radically divergent as word processing and missile guidance systems.

This is not to say that AI does not have extremely promising military use cases. It absolutely does. Take, for example, the case of computer vision AI for satellite image recognition. Maxar Technologies, a major commercial satellite operator, has a large fleet of imagery satellites that collect so much data that Maxar estimates it would take 85 years of human analyst labor to identify all the objects in one day’s worth of imagery collection.

For Maxar, using computer vision AI does not eliminate the need for human imagery analysis, but it does dramatically improve the productivity of human analysts by doing a first-pass analysis of that imagery. The AI system identifies images that are likely to be interesting because they contain recent changes in activity or other noteworthy features. The result of the AI’s analysis helps human analysts prioritize which images to review. The key is that Maxar’s use of AI is addressing the company’s overall operational bottleneck (a shortage of imagery analysis capacity) and is also working on a problem where AI’s strengths can effectively be leveraged (prioritizing human analyst time allocation) while the AI system’s weaknesses can be mitigated. (The not-entirely-reliable AI’s role is either confined to non-mission-critical areas or is double-checked by humans.) This is a similar approach to how many high-performing organizations integrate AI capabilities into their workflow.

Modern AI systems using machine learning are good at processing vast amounts of data and finding patterns in ways helpful for tasks such as detection, classification, prediction, optimization, and automated decisionmaking. AI supports speed and scale in problem-solving where data speeds or sizes may be too extreme for humans to realistically manage. Like any tool, however, AI only works well in certain situations, and AI is not a panacea for all productivity challenges.

The key for DOD AI leaders is to build a pool of AI talent and capability and then match it to problems that are both a high priority for operational communities and best addressed by the capabilities of modern AI. Finding the right overlap will take much discussion, experimentation, and error. This brings us to the subsequent questions, which are focused on ensuring that the right success factors are in place for a given AI use case.

Question 2: Data

How are you going to get enough of the right kind of data to develop and operate your AI system?

DOD program managers need to have a firm understanding of AI technology before beginning any AI-enabled capability development effort. In the case of modern AI, which is primarily based on machine learning, data is critical to nearly every aspect of program management and system performance. Machine learning software is different from traditional rules-based software in that much of the “intelligence” of the system is not programmed by humans but instead is learned from data. The following figure depicts the difference between the two types of data for developing an AI model during the training phase and using the trained AI model operationally. The figure assumes that the effort is using the supervised learning family of machine learning, which will be true of the vast majority of—though not all—AI use cases in the DOD.

Simplified Process Diagram for Supervised Machine Learning

Image
Source: Author's Creation

Source: Author's Creation

As the figure above makes clear, data is the foundation of modern AI. Fortunately, the 2020 DOD Data Strategy and the 2021 Deputy Secretary of Defense Memorandum, “Creating Data Advantage,” both strongly emphasize that “data is a strategic asset.”

Despite this, data-related challenges remain some of the most important barriers to AI development and deployment in the DOD. This may seem counterintuitive, as some parts of the DOD—particularly in the intelligence community—have amassed enormous amounts of data over decades. Even in cases where quantity is not the problem, however, there is a variety of issues concerning how data is organized, stored, enhanced, and accessed that present obstacles to AI development and adoption. The following sections will walk through some of these challenges using the example of a satellite image classifier AI system.

First, training data are generally application-specific. Satellite image recognition training data only help build satellite image recognition AI. Moreover, if an organization seeks to build a satellite image recognition AI, then only satellite image training data will do. One cannot magically use labeled facial portrait images or commercial financial data to train a satellite image recognition AI.

Second, modern machine learning systems require a large amount of training data. Conveniently, for many DOD sensors systems, including reconnaissance satellites, the amount of data produced is astonishingly large. However, from an AI perspective, the amount of data in terms of gigabytes or petabytes is only part of the story. In the case of supervised machine learning—which, again, is the most relevant machine learning paradigm for most DOD AI applications—the more important metric is the number of examples in the training dataset. For example, a satellite image recognition AI classification system requires tens of thousands of labeled images for each class (e.g., “vehicle,” “building,” “road”) and sub-class (e.g., “Russian T-72 tank”) of object that the system seeks to include in its computer vision capability. Moreover, there needs to be consistency in how these labels are applied. For AI-enabled computer vision applications, there are at least three major approaches to data labeling that are commonly used:

  • Image classification, in which each image is classified as a single entity, even if only a subset of the image is actually relevant to the class (e.g., the image is of a military base with tanks and aircraft and buildings, and the whole image file is labeled “military base”).
  • Object detection, in which the different portions of the image are labeled individually (e.g., a bounding box is placed around each tank, each aircraft, etc., and the subset of the image in each of those boxes receives individual labels, such as “tank”).
  • Image segmentation, in which individual pixels in the image are dynamically grouped for the label (e.g., the pixels that comprise each tank are tightly bounded and labeled “tank”).

The above example is not meant to suggest that every reader needs to memorize the three approaches to data labeling for a computer vision AI capability. Programs working with other types of data, such as audio, text, tabular, or RADAR, will each have different approaches. Rather, the key takeaway is that data labeling is a complicated and vital stage of most DOD AI capabilities at both the development and the operational stage. Many of the strategic choices that determine the viability, success, or failure of a given AI project or program are directly related to ensuring that there is a well-defined data labeling program architecture and a clear path to acquiring and continuously updating a large, labeled training dataset for AI models.

Take the case of Tesla, a company that has invested billions of dollars into research and development of AI for autonomous driving applications. In a 2021 interview, the then-director of AI and Autopilot Vision at Tesla, Andrej Karpathy, said the following about Tesla’s approach to data labeling and data curation:

"If I need [the car’s AI] to recognize fire hydrants, it’s absolutely doable, but I need 10,000 examples, 50,000 examples . . . from all the possible rotations and all the possible brightness conditions . . . We actually have an entire data labeling org that we’ve grown inside Tesla because this is so fundamental to what we do . . . We have a highly professional, highly trained workforce that curates our datasets, and we think that this is the right way to go because this is just a new software programming paradigm and these are our new programmers . . . When they’re annotating examples, they’re telling the system how to interpret the scene, and they are quite literally programming the autopilot, so we invest quite a bit into the org and we keep it close, and they collaborate with the engineers very closely."

As Karpathy makes clear, the data and data labeling requirements for high-performance AI systems are massive.

Moreover, the challenge is not only to acquire data of adequate quantity but also of adequate diversity and variety. In the case of AI computer vision applications, images in the training dataset must include sufficient variety across many different image characteristics—for example, viewpoint variation (how the object is oriented with respect to the camera), illumination conditions (daytime/nighttime), and occlusion conditions (the extent to which the object is partially obstructed by other things in the image).

The industry team supporting Project Maven, for its part, also had a very large team working on similar issues as those facing Tesla. Colin Carroll, a former Defense Department official involved in Project Maven from the beginning, stated in a December 2022 interview that

"[the budget of Project] Maven is $257 million a year, R D T and E  [of Research Development Testing and Evaluation funding]. A lot of that went to industry. Probably a hundred million of that was on the DevSecOps platform side. So, data acquisition, data curation, data labeling. Maven had a team of 400 data labelers annually, eight-hour shifts, just labeling GEOINT data from all different platforms, all different sensors, unclassified, classified. There’s no other program in the Department that’s even remotely looking like that."

Carroll correctly states Maven is a significant outlier among DOD initiatives in terms of the seriousness with which it took the engineering of its data labeling pipeline and its data labeling organization. However, it is also critical to understand that data labeling work is never truly finished. As a result of Project Maven’s work supporting U.S. military operations in Africa and the Middle East since 2017, the Maven organization collected and labeled a massive amount of remote sensing data covering the desert environments and the types of vehicles that are commonly seen in those regions. While the system can deliver useful performance in those regions and environmental conditions, additional work would be required to replicate Maven’s success in other contexts, even using the same AI models and the same sensor platforms. One former DOD official who supported Project Maven told CSIS: “Acquiring sufficient data that is diverse enough to be useful is tough. We have a labeled dataset that is good for desert clear-sky, high-sun conditions, but if you take that AI model to a snowy landscape, the performance drops.”

To understand why, consider the following famous example illustrating the limitations of AI computer vision systems. Researchers at the University of Washington trained an AI imagery classifier to determine whether a given image was of a “wolf” or a “husky.” However, the AI classifier made its determinations not based on the features of the animal but on the presence or absence of snow in the image background. This error occurred because all the labeled images of wolves in the training dataset had snow in the background, while none of the labeled images of huskies had snow in the background. After training was completed, the operational AI model would incorrectly classify any image of a husky on a snowy background as a “wolf” and any image of a wolf on a non-snowy background as a “husky.” As this example should make clear, AI program managers should be cautious in making assessments about what sort of “learning” has actually occurred during the training phase of AI development. What appears to be over 99.9 percent accuracy during poorly designed testing could mask an overall system brittleness that could reveal itself during operational use, perhaps with disastrous consequences.

This type of AI failure mode is a key challenge to keep in mind for DOD program managers. The DOD has worldwide operational requirements, and it must be prepared for war to break out at any time—even in geographies that have been peaceful for decades and even under operational conditions that are genuinely unprecedented. Some types of data are easy to collect even in peacetime, such as satellite remote sensing or social media posts, but other types of data might only be collectible during wartime. Russian military officials have even explicitly stated that they view Ukraine’s use of the North Atlantic Treaty Organization (NATO)-provided weapons and equipment as their best opportunity yet to collect operational training data on NATO equipment for use in Russia’s future military AI applications.

There are a variety of AI techniques and approaches that can somewhat reduce overall data needs, and this has been a research area of intense focus over the past few years. Transfer learning, in particular, can potentially allow AI to use data of the same type (e.g., images, audio) but covering different classes (e.g., faces, cars) to reduce overall labeled data needs, but as of this writing, this does not eliminate the need for large quantities of labeled training data in the target class. There has also been important progress in the field of self-supervised learning over the past three years, and continued progress in this area may further lessen data labeling requirements compared with traditional supervised learning for some applications, including image recognition. Thus far, however, that progress has been more modest in working with imagery data compared with text or audio data, and the overall need for significant diversity in training datasets appears unlikely to change any time soon. The same is true of progress in leveraging advances in synthetic data, which is another promising technique, though in most cases immature, for reducing labeled data requirements.

Thus far, this paper has focused on AI challenges using the example of AI computer vision applications. However, data issues are equally challenging—and sometimes more so—for use cases such as logistics or human resources. As one former service member told CSIS, some DOD maintenance teams still use pen and paper maintenance records, while others use a variety of software-based systems, which may or may not be designed to be easily compatible with each other. Having a lot of data is not the same thing as having a lot of data that is easily used as a training dataset for AI. Addressing data challenges is generally a prerequisite for applying any type of AI-enabled solution at scale.

In short, strategic programmatic complexity for data labeling and data curation manifests itself in many areas, such as:

  • Human labeler expertise: Can non-experts label the data, such as in the case of facial recognition images? Or can only trained experts label the data, as in the case of synthetic aperture radar images or complex maintenance data?
  • Labeling standardization: How can you ensure that all labelers are using a consistent approach and level of quality in the generation of their labels?
  • Data availability and standardization: Do different parts of the DOD performing the same function collect equivalent types of data and store it in the same format?
  • Data pipeline management: How are you going to engineer a system to ensure that recent and relevant operational data is going to constantly be added to your training dataset?
  • Data storage and computing requirements: How are you going to store and process your data in a way that is viable given budget constraints and frequently significant AI computing costs?
  • Data ownership rights: Does the U.S. government own the critical datasets, or do industry partners own them?
  • Data access rights: Even if the U.S. government does own the data, how will you ensure that the AI development organization has access to the operational data (which is likely owned by a different part of the DOD)? Much of the DOD’s data is sequestered across various data silos and different levels of classification.

Given all of these challenges and complexities, it’s no surprise that Craig Martell, the head of the DOD Chief Digital and Artificial Intelligence Office (CDAO), recently said in an interview that his top priority for the DOD is “driving high quality data” and that his vision for the CDAO office was “as a centralized supporter of AI in the department to give you the tools, abilities, and consulting, to be able to build that labeled data so that you can hand it to industry, and they can build a model that works for you.”

One senior executive in the DOD working on AI development efforts told CSIS in an interview that “the number one thing I spend my time on is negotiating data rights and data sharing agreements between different parts of the DOD and with our industry partners.”

This is entirely appropriate and unsurprising. DOD AI program managers should similarly expect that data-related issues will arise again and again in many of their most complex challenges.

Question 3: Computing Infrastructure and Network Access

 How will you get your AI system approved to reside on and interact with all the DOD networks required for its development and operational use?

One of the key challenges for the DOD in developing and operating AI-enabled capabilities is the diversity of the DOD’s digital networks and the obstacles to effectively building software systems that access and move data across those networks. According to U.S. Cyber Command, the Department of Defense Information Network (DODIN) is comprised of:

  • 46 distinct DOD components (e.g., combatant commands, military services, DOD agencies);
  • 15,000 classified and unclassified networked environments; and
  • 23 Cyber Security Service Providers (CSSPs).

This massive landscape of DOD networks is under constant assault from cyber threats, and so understandably, the DOD views ensuring the cybersecurity of its networks as a high priority. Despite its benefits, the current DOD approach to cybersecurity presents an enormous challenge to any DOD organization seeking to develop, operate, and update software-enabled systems, especially those that utilize AI machine learning capabilities. In particular, the current process for granting a specific software system permission to operate on a specific DOD network—known as Authority to Operate (ATO)—is a major barrier to accelerating AI adoption.

Every software system that operates on the DODIN and processes government data must receive an official ATO from a certified DOD authorizing official. Without an ATO, software cannot legally operate on the DODIN. Many types of data—such as confidential or classified secret data—are forbidden from leaving government networks. Even in the case of certain defense contractor networks that are authorized to receive and use classified data, these networks fall under the authority of a DOD CSSP and must meet substantial requirements to connect to the DODIN, including ATO requirements.

The ATO process requires any organization proposing to allow a new software system onto that DOD component’s network to provide extensive written documentation demonstrating how the software will fully comply with cybersecurity controls that apply to the relevant security level (e.g., confidential, classified secret, classified top secret) as well as additional controls that apply to specific DOD organizations and systems. Most DOD components have only a single authorizing official with responsibility for authorizing all systems within that component and weighing the benefits and risks of allowing prospective new applications. These officials face enormous demands on their time, and their incentives are generally aligned toward extreme risk aversion. One interviewee told CSIS that as an authorizing official, it’s much easier to damage your career by wrongly saying “yes” to an ATO request than by wrongly saying “no.”

The ATO process takes a long time. For new software or newly updated software, experts at the General Services Administration (GSA) estimate that it routinely takes 6 to 18 months to receive an ATO across the government. Some argue that longer than 18 months is more typical in the case of DOD AI-enabled applications. This long timeframe often applies even if the software development is already finished prior to beginning the ATO process and even if the software has previously received one or more ATOs to operate on other parts of the DODIN.[1]

In practice, this means that if a DOD program manager is seeking to use DOD data as part of an AI capability development effort, development on that AI capability cannot even begin until the development environment and all of the various pieces of software in it have received an ATO. In the commercial and academic sectors (and evidently in the Ukrainian military), an AI developer can easily just download widely used and freely available open-source AI development frameworks such as TensorFlow or PyTorch and immediately load their data into those frameworks to begin developing AI models.

By contrast, securing permission to install TensorFlow on DOD networks may require months of work by professionals with experience navigating the ATO process. Additionally, that ATO would most likely not even cover future software updates, since any future changes would be viewed as potentially violating the conditions agreed to in the ATO.

The existing ATO process presents a powerful obstacle to rapid, iterative, data-driven software development. In a conversation with CSIS, one DOD senior leader with years of direct experience on AI development efforts said, “These challenges with ATOs are eating us alive.”

Moreover, more and more DOD leaders are reaching the conclusion that the ATO process does a poor job of ensuring cybersecurity. Aaron Weis, the chief information officer of the Department of the Navy, said in an interview last year that

"the idea of a three-year ATO is wrongheaded—you fill out a giant spreadsheet and do 10,000 pushups, and then you get an ATO that’s good for three years . . . And then what happens? Over the next three years, that system hasn’t evolved or been updated. It’s no longer secured, and it ends up as a high-risk escalation that ends up on my desk. . . . [It] incentivizes bad behavior."

There are several options available for DOD program managers seeking to develop and deploy AI-enabled capabilities when it comes to the ATO barrier.

Option 1: Contractor-Owned or Contractor-Operated Capabilities (aka “Data as a Service”)

The restrictions for sending data to the DODIN are significantly less than the restrictions for installing software on the DODIN or moving data from the DODIN to the commercial internet. This means that if the contractor owns the sensor and owns the computing infrastructure, it can develop advanced AI capabilities relatively quickly using commercial practices and then operate those capabilities on behalf of the government. In this acquisition model, the DOD is not paying the companies to develop an AI-enabled software or hardware capability. Instead, it is buying data.

This is the primary business model of commercial satellite remote sensing companies such as Maxar Technologies, which was mentioned earlier in this paper. Maxar builds, owns, and operates a fleet of remote sensing satellites and sells the data to the U.S. government. Maxar makes decisions for itself about whether and how to use AI capabilities to enhance its ability to provide data to the government, but it is financially incentivized to constantly improve the usefulness of its services.

Another example of the “data as a service” model comes from Task Force 59—a task force established by U.S. Naval Forces Central Command (NAVCENT) in 2021 to rapidly integrate unmanned systems and AI into the Fifth Fleet’s operations. Task Force 59 achieved an impressive cadence of iterative technological development and integration into Fifth Fleet naval exercises and operations. Many of its AI-enabled capabilities were procured on a “data as a service” basis—meaning that many of the naval sensors, including unmanned and autonomous naval drones, were owned and operated by Task Force 59’s industry partners, not the U.S. Navy. Moreover, the AI algorithms ran on commercial cloud infrastructure. The data produced by these systems, including the benefits of using AI, were then moved to the DODIN on a one-way transmission basis: contractor system data go to the DODIN, but the DODIN does not send data back to the contractor systems.

For Task Force 59, this architecture eliminated the need for an ATO and rapidly accelerated the pace of overall technology development and deployment. However, a key downside of this model is that there are severe restrictions on the ability of these commercial systems to directly integrate with military-owned assets, which typically operate on classified data networks. The implications of this will be explored further in a subsequent CSIS paper.

Option 2: Seek to Declassify Your Data

If a DOD AI program manager’s goal is to have industry partners be responsible for AI development on their infrastructure, that is significantly easier if the data in question is unclassified, since there are far fewer restrictions on sharing unclassified data with contractors. As discussed previously, one way to have the data be unclassified is to have the contractor be the originating source of the data; but this is not possible when the data in question is inherently governmental, such as DOD human resources data or data produced by DOD weapon systems in the course of operational use.

Some of the data produced by DOD weapons and sensor systems is inherently classified, but in some cases, there are only certain parts of the data that are required to be classified while other parts are not. In those cases, it may be possible to separate out the unclassified and classified portions and to provide only the unclassified portions for use by the data labeling and AI development organizations.

Option 3: Pursue a Continuous ATO or Partner with a DOD Organization That Has One

The ATO process is not only time-consuming, but it also places severe restrictions on the ability to update software in the future. An ATO is an approval process related to the current architecture of the software and the planned set of future modifications to that software. But what if unforeseen challenges or changing demands from the operational community lead to the need for unplanned modifications? Those changes could put ATO compliance at risk and force the program to begin the lengthy ATO certification process all over again. Even if there are no changes to the software system, most ATOs expire after three years and must be reapproved. All of this incentivizes program offices not to make major changes to their software systems, even if those changes would improve the user experience and increase operational effectiveness.

Recently, there has been some momentum in the DOD to pursue an approach commonly referred to as a “continuous ATO” (cATO), which can help address these issues. For a DOD program, the cATO approach most likely does not shorten the 6 to 18 months required to secure an initial ATO, but it does massively reduce the time required to deploy software updates, as well as the complexity of this task. Typically, this is because the cATO plan establishes the approved conditions and restrictions under which all software updates will be developed, as well as an automated test suite and an internal review process by which software updates will be assessed prior to deploying from commercial infrastructure to the DODIN or from one part of the DODIN to another.

In February 2022, David McKeown, then the DOD’s senior information security officer, signed a memo that provided guidance on how DOD systems can gain approval to operate under a cATO state. Of special note is the remark that “DOD CISO [Chief Information Security Officer] approved cATOs do not have an expiration date and will remain in effect as long as the required real time risk posture is maintained.” Two of the key requirements stated in that memo for achieving a cATO are demonstrating that the system embraces the DOD Enterprise DevSecOps Strategy and that it is aligned to an approved DevSecOps Reference Design.

Programs that can either execute their AI development process on a DOD network environment that already possesses a flexible cATO or partner with an organization that has a cATO are in a privileged position, since they may be able to begin development almost immediately. Examples of DOD development environments that possess this sort of flexibility include:

  • The DOD’s Secure Unclassified Network (SUNet). According to its description in the DOD’s budget request, SUNet “provides defense and interagency partners with an accredited platform that enables secure unclassified information sharing, joint analysis, and advanced Research, Development, Test, and Evaluation (RDT&E) in support of critical operational missions on a global scale.” SUNet is owned and accredited by the Irregular Warfare Technical Support Directorate (IWTSD), under the assistant secretary of defense (Special Operations and Low-Intensity Conflict). SUNet’s previous and current sponsors include, among more than a dozen others, the Joint Artificial Intelligence Center and Project Maven. The DOD has collectively invested hundreds of millions of dollars in SUNet, with much of that going toward strengthening its capabilities as an AI DevSecOps capability with a cATO.
  • Air Force Platform One and Navy Black Pearl. Platform One provides a broad set of cloud infrastructure capabilities and tools useful for developing and fielding software systems for military service program offices and other DOD organizations. Platform One is officially approved as a DevSecOps Enterprise Services team for the entire DOD. Organizations that work with Platform One report that they are able to field software updates more than 10 times per day. Moreover, in some cases, Platform One provides the capability of deploying software updates to the operational networks of major weapons systems such as the F-16 and F-35. While Platform One is by far the most mature of the military service offerings, the Navy’s Black Pearl system intends to borrow heavily from Platform One’s technology stack and also provide support related to the Navy’s additional compliance requirements.
  • Advana. Advana is an enterprise data analytics platform owned by the DOD’s CDAO. Advana is principally focused on enterprise data such as financial, logistics, and organizational management use cases, not sensor data or warfighting applications. Advana is not an environment for developing new software and is not—at least currently—focused on the use of AI capabilities. However, its work to provide authoritative enterprise data sources and to enable rapid implementation of mature tools for data analysis and data visualization, such as dashboards, will likely be useful for many DOD communities. Additionally, Advana may continue to evolve and may add capabilities more directly related to AI development in the future.

Thus far, this paper has been focusing primarily on ATOs for the AI development process. However, most DOD software systems are developed on a different network (and under the authority of a different authorizing official) than the network that the system operates on. For example, the software for a drone surveillance aircraft might be developed on the network of an Air Force Program Office in the United States, but it will likely be deployed operationally on the networks of multiple combatant commands. This means that, for a cATO to truly make a difference in the speed of warfighter capability improvement, it needs to provide the AI system with authority to operate on both the development and the deployment networks.

Question 4: Technical Talent

How are you going to attract enough of the right kind of AI talent and put that talent to good use?

Recruiting skilled AI talent is difficult everywhere today, but it is especially tough in the government. There are relatively few experts in AI. A recent state of the field report shows that in 2021, there were fewer than 2000 new computer science PhD graduates, of which fewer than 0.7 percent entered government.

Salaries in the commercial sector for leading AI developers cannot be matched by the DOD. Salaries for newly minted computer science PhDs with AI-related skillsets are reportedly $300,000 to $500,000 annually, meaning that junior AI technical staff in commercial industry are routinely being paid more than twice what the U.S. military pays four-star flag officers.

Not every AI expert in the DOD needs to have a PhD and not every AI expert is exclusively motivated by money. Still, the overall problem is significant. The National Security Commission on Artificial Intelligence stated in its final report that “the human talent deficit is the government’s most conspicuous AI deficit and the single greatest inhibitor to buying, building, and fielding AI-enabled technologies for national security purposes.”

Much of the DOD’s recent attention when it comes to improving the government’s AI talent pool has focused on ensuring that the DOD has special hiring authorities related to AI talent. However, there is an additional challenge that is not often discussed, due to its sensitivity: ensuring that the precious time of the precious few AI experts who do serve in government is not wasted. Too often, it is.

In the course of interviewing stakeholders for this project, CSIS learned of many troubling examples of DOD AI talent mismanagement. Some illustrative examples are:

  • One DOD organization had a highly unbalanced ratio of supervisory and staff positions, so a PhD AI expert serving as a supervisor had more than 40 direct reports. DOD supervisors have many mandatory and time-intensive duties that increase in direct proportion to the number of direct reports, including a requirement to write performance reviews every six months.
  • Several DOD organizations had Human Resources staff who were unfamiliar with the special hiring authorities that were used to bring AI talent on board, and they frequently provided inaccurate guidance about restrictions on the type of work they were legally allowed to perform.
  • One DOD organization brought an AI software developer on board to “build AI capabilities” but then forced the individual into an unwanted role overseeing industry partners because “the government doesn’t write code.”
  • One service member noted that the Army Functional Area (job category) in which service members tend to have skills most closely related to data and AI is Operations Research and Systems Analysis. However, when these individuals are involved in AI development programs, they are often assigned to positions related to budgeting and resource planning because those roles have open billets and because organizational leaders often do not actually understand the skills of this community.
  • Multiple interviewees highlighted the security clearance process as a key barrier. Once offered a position, a new hire AI expert might wait over a year for the proper security clearances to be able to work on the problems they were hired for.

These issues are all compounded by the data and network access problems highlighted earlier in this paper. In a competitive talent environment—where the strongest attraction to government service is the opportunity to work on meaningful problems—these challenges can mean that AI experts spend their time struggling against the bureaucracy rather than using their expertise to develop capabilities that can help the DOD fight and win.

The above talent challenges refer specifically to the difficulty of recruiting AI talent into the DOD civilian workforce, which is part of the responsibility of a DOD program manager. In 2022, the DOD did create a new set of job codes for AI- and data-related work roles, taking advantage of the DOD Cyber Workforce Framework. This at least makes it easier for DOD organizations to include authority to hire individuals with these roles in their upcoming budget requests. However, the pipeline of AI talent—particularly in the military services—is nascent. Each of the military services has a system in place for recruiting and training the needed number of pilots, communications specialists, nuclear engineers, and so on. There are job codes and organizational talent pipelines for each of these specialties. The equivalent military talent development pipeline for data, AI, and software development is still in the early stages of being constructed and implemented.

The consequences of this talent shortage and talent mismanagement are enormous, and this is true even if the DOD continues to outsource most of its AI-related technical work to industry. As one military AI program manager told CSIS, “The government staff who have adequate expertise to ask the right questions of industry are pretty much all leaving to go work in industry.”

Question 5: End-User Feedback

How are you going to ensure that the operational user community is a frequent source of feedback and insight during the development process?

To have an impact on national security, military AI capabilities need to be developed in close cooperation between two communities: operational warfighters—which typically means the combatant commands—and the capability development community, which includes both DOD program management staff and their industry partners. The traditional DOD bureaucratic mechanism for ensuring cooperation between these two communities is the requirements process and the acquisition process. The requirements process is managed by the Joint Staff, and the most influential process for taking combatant command needs and converting them into capability requirements is the Joint Capabilities Integration and Development System (JCIDS). This decades-old system has its virtues. It is good for driving certain kinds of technological interoperability (especially in hardware), preventing unnecessary duplication of effort, increasing fairness in the acquisition process, and providing DOD-wide clarity on what work needs doing and who in the DOD is responsible for doing it.

At the same time, JCIDS is a slow and too-often inflexible system that—particularly in the case of AI and data-driven software development—provides only a fraction of the insights from operational users that developers need. Done correctly, AI development is an iterative process in which each iteration cycle receives a new round of feedback from testers and operational users.

Outside of a handful of government research communities, machine learning AI is still a relatively new undertaking for the DOD, and a clear understanding of what AI can and cannot do is not yet widespread among DOD personnel. Moreover, technological progress in AI is extremely rapid—and accelerating. A requirement-setting process based on the idea that requirements will be accurately set for a desired capability and then remain fixed for several years of development is utterly unrealistic.

Multiple CSIS interviewees with DOD AI program management experience stated that it is common for critical system requirements to be discovered too late—not only after the JCIDS requirement is finalized but also long after the contract is awarded to an industry partner. Requirements, both at the JCIDS stage and in contracting language, need to be sufficiently flexible to allow for DOD organizational learning to take place, and learning must be a partnership between the operational community and the development community. The former must learn about how AI can be impactful for their mission, and the latter needs to learn about the mission needs and how their proposed capability would fit into the operational reality of the warfighting community.

Members of the DOD and of the private sector all noted that there is a user engagement issue when developing AI tools. This is due to a variety of reasons. First, the people developing systems often only have a few chances to talk to system users. A former member of the DOD noted that contractors often talk to users briefly, go away to build a tool, and come back months or years later with a solution that does not actually fit the needs of the customer. The users are often also busy, and they can view helping with development as an additional task in their already heavy workload, which disincentivizes collaboration. Both a current DOD leader and a contractor further emphasized that it can be difficult for the operational community to put their needs in a language that developers can make use of.

As already discussed, iteration—which is the basis of AI—is likely needed to ensure the best outcomes. Such iteration would require changes in how developers and end users engage, as well as likely changes in the structures overseeing those engagements.

To ensure tight linkages between the developer and end-user communities, DOD AI development efforts should be tied to real-world integration exercises. For any capability that is tied to warfighting activities, rather than DOD business enterprise activities, integration exercises require the involvement of one or more combatant commands. One DOD executive told CSIS in an interview that “any [AI capability program management model] that is not Combatant Command-centric is tough to justify” and that “for tech in general, the best practice is to get it into the hands of the user as early as possible.”

Question 6: Budgeting and Transition

Are the diverse DOD development, procurement, and operational organizations involved with your AI capability adequately budgeting for their involvement in the program?

The DOD is such a large organization that fully grasping its scale is incredibly difficult. In 2022, the DOD budget was more than $756 billion, representing roughly 3 percent of U.S. gross domestic product. Even counting only active-duty military service members and government civilians, the DOD employs more than 2 million people. At such extraordinary scale, it is natural that political leaders would desire for DOD organizations to specialize with clear roles and responsibilities, both in order to avoid wasteful duplication of effort and to provide accountability for the effective and efficient use of resources.

The DOD resource allocation system, referred to as the Planning, Programming, Budgeting & Execution Process (PPBE), is designed to address fundamental government challenges, such as ensuring that democratically elected representatives maintain control of defense spending and that appropriated funds are used exclusively for their intended purpose without waste. It is also designed to provide a measure of stability in the development and sustainment of extraordinarily complicated military capabilities—such as nuclear submarines—where technological and organizational excellence is difficult to achieve and would be easy to lose in the event of a multi-year gap in funding. This is no small challenge with a military in which nearly all service members change assignments at least every few years.

In general, the current system does a reasonably good job at achieving these core goals, but it also has many downsides, not least of which is the fact that it is slow and inflexible and requires the separation of many activities that more naturally would be grouped together. Though the architecture of the DOD PPBE process has many implications for AI adoption, this paper will focus on two major challenges that relate to budgeting and transition:

  • The organizational separation of the research and development community, the procurement community, and the operations community; and
  • The budget “color of money” that separates fiscal appropriation categories and restricts what type of funding category can legally be used for a given type of activity.

In the case of technology systems development, this division can be summarized (with some oversimplifications and omissions for the sake of brevity) into three broad communities:

  • The research and development community, which includes research-focused organizations such as DARPA and the military service labs, as well as those military service programs of record that are in a development or modernization phase of their lifecycle;
  • The procurement community, which includes military service programs of record and certain DOD agencies; and
  • The operational community, which includes combatant commands and certain DOD agencies.

This division works reasonably well for hardware technologies, since the switch from the development stage to the scaled production stage and the physical transfer of the produced items to operational communities are all natural process points that can be aligned with organizational responsibilities. It also provides a measure of (albeit misleading) simplicity to senior budget decisionmakers: for example, if you want to accelerate technology modernization, simply give more money to the research and development organizations that work on advanced technology.

However, this hardware-centered organizational model struggles to accommodate the reality of software-driven development and operations—and especially of AI. The Defense Innovation Board memorably titled its 2019 study “Software Is Never Done,” and there is a great deal of wisdom packed in this short phrase. Even in traditionally hardware-focused industries, such as car-making, software is increasingly the focus of competitive advantage. New cars wirelessly transmit data back to manufacturers (including data that are used for training AI models), and carmakers continue to supply cars with software updates that provide new features and functionality for decades after they are sold.

For the DOD, this challenges the existing organizational structures and divisions of labor upon which the budgeting process is based, as software research and development, procurement, and operations are often more tightly linked in terms of both time (leading tech companies are often able to incorporate newly published AI research results into their customer-facing products in a matter of weeks) and organizational responsibility (the organizations that sustain and operate customer-facing software platforms are also deeply involved in development and scaled deployment). In some cases, such as with Tesla, customer-facing software developers are also heavily involved in more fundamental research.

There are parts of the DOD that attempt to replicate this model—commonly referred to as DevSecOps—with varying degrees of success. Perhaps the most famous example is the Kessel Run organization within the U.S. Air Force. However, DOD officials interviewed by CSIS stated that their own attempts to implement DevSecOps-type approaches that fuse software development and operations were frequently viewed within the DOD as an inappropriate encroachment on the responsibilities of other organizations and were met with significant bureaucratic resistance.

This leads to the second problem with the DOD budgeting process, which is the color of money. DOD funding is split into different appropriation categories, and it is generally illegal to use one category of money for a different purpose. For example, a DOD organization cannot use RDT&E funding or Procurement funding to undertake major new facilities construction—which must be funded with Military Construction (MILCON) dollars—even if the purpose of those new facilities is directly related to conducting research and development or supporting procured systems.

Some types of organizations, such as combatant commands, generally have no RDT&E funding. This is part of the reason why the U.S. Navy’s Task Force 59, which is part of NAVCENT, structured nearly all its efforts under the contractor-owned, contractor-operated or “data as a service” acquisition model. Task Force 59 is funded almost exclusively with Operations and Maintenance (O&M) funding, and those funds can legally be used to purchase data services, though they generally cannot be used to fund research and development. In practice, however, the companies supporting Task Force 59’s operational technology integration maritime exercises—which are held every 90 days—are engaged in an extraordinarily rapid pace of research and development. One DOD official with direct knowledge of Task Force 59 stated that, between exercises, the companies involved were making “hardware changes every month, and software changes every week.” During the Task Force 59 exercises, in which the companies got to see their systems being tested by real combatant command warfighters, the pace of development accelerated to “hardware changes in a week, and software changes in a day or sometimes multiple times per day.” In an interview with CSIS, an executive at one of the companies involved said that “Task Force 59 pushed us to move harder and faster than any DOD R&D program I’ve ever seen.”

That said, the DOD budgeting and requirements-setting process now poses major challenges to scaling and transitioning Task Force 59’s success elsewhere in the DOD or even within NAVCENT. There are many DOD AI use cases that relate to inherently governmental functions where a contractor-owned, contractor-operated model is not appropriate. Moreover, combatant commands are not responsible for procuring technology systems at scale, a job that generally lies with the military services and some Defense Department-wide agencies. No Navy program office—which could have the necessary RDT&E and Procurement funding—has been established to begin purchasing the systems developed as a result of Task Force 59’s exercises.

Creating such a program office or even revising the plans of an existing program office to undertake this work would require going through the DOD requirements-setting and PPBE processes, which would most likely take years. That multi-year gap between development success and funding for scaling the technology across the DOD is a significant challenge for the technology companies that supported Task Force 59. Every technology company must decide whether to invest in growth to support defense or commercial customers. The fact that producing a proven capability that warfighters love is no guarantee of large future purchases that would provide a financial return on the company’s investment is a significant disincentive to working with the DOD.

CSIS spoke to the American CEO of a major, highly diversified AI technology company with billions of dollars in annual revenue. The CEO specifically told his staff not to pursue work with DOD customers, and he said, “I love this country, and I care about national security, but, even if you give the DOD exactly what they want, you never make any money.” The CEO also stated that he felt that pursuing DOD contracts and revenue would naturally lead his company to evolve to accommodate the DOD’s bureaucratic structures. “I want [my company] to be world-class at building and scaling AI technology, not world-class at navigating the DOD bureaucratic labyrinth.”

The DOD is working to address at least some of these issues. Regarding the color of money issue, in 2020, the DOD launched a new budget activity category (color of money) for software and digital technology. This is intended to be more flexible and address some of the previously mentioned concerns. Congress approved 16 programs to participate in the initial pilot project. However, Congress did not approve any of the subsequent pilots that were proposed in the FY 2022 and FY 2023 budget requests. Congress, particularly the Senate Defense Appropriations Subcommittee, has expressed concerns with expanding the pilot program until more detailed data is available about how the funds are being used and how the additional funding flexibility impacts their program management effectiveness. This caution, though well-meaning, is penny-wise and pound-foolish. Far more of the taxpayer’s money is being lost to the inefficiencies dictated by the existing PPBE system than would be lost due to any shortcomings in Congressional oversight over the new funding system.

Thus, most DOD AI program managers are unlikely to receive the unique budget authorities created in the software pilot anytime soon. That means that for their program to make a true impact on warfighter needs, they should concern themselves with more than just appropriately planning, executing, and defending their own budget request. They should also work to identify other parts of the DOD that need to be budgeted for their role in the AI program. If the AI program is developing technology and is not currently aligned explicitly to a DOD program office, then a program office or DOD agency needs to be budgeting to scale the capabilities that are developed by it, and end-user communities need to have adequate supporting infrastructure and trained staff to adopt and implement the capability once it is developed.

Conclusion

The DOD has long recognized and articulated the outsize role that AI and autonomous technologies will play in the future of warfare and U.S. military power. Multiple secretaries of defense, spanning multiple presidential administrations, have stated that AI and autonomous systems are among the DOD’s top technology modernization priorities. However, the DOD faces multiple challenges which slow down the adoption of AI and AI-enabled technologies, creating risks of falling behind as the future approaches. Some of these challenges are inherent in the current state of AI and autonomous systems technology, but by far, the most significant barriers relate to the unique DOD context.

This paper has analyzed six of the most significant of those challenges from the perspective of DOD program management leadership: mission, data, computing infrastructure, technical talent, end-user feedback, and budget. There are additional—and critical—challenges that will confront DOD AI program managers, such as how to ensure that the contracts between the government and its industry partners adequately protect governments interests and how to design a test and evaluation program that can give confidence that a given AI capability will perform as intended across the full range of its operational uses, but these are beyond the scope of this paper.

The six barriers identified above have existed for other technologies, and the department has attempted to solve some of the issues before, with mixed results. The next paper in this CSIS series will offer concrete recommendations toward overcoming these obstacles and accelerating AI and autonomous systems adoption in the DOD.

Developing and fielding AI-enabled capabilities in the DOD is hard. Any one of the six challenges discussed in this paper is by itself enough to prevent an AI program from achieving its objectives. Stakeholders across Congress, the media, and the DOD should be cautious in watching a single AI or autonomy program struggle and inferring what it suggests about the viability or desirability of this technological transformation.

There will be many failures on the road to success.

Gregory C. Allen is the director of the Wadhwani Center for AI and Advanced Technology at the Center for Strategic and International Studies in Washington, D.C.

The author would like to thank Jack Shanahan and Richard Danzig for feedback on earlier drafts of this report. The author would also like to thank Conor Chapman for research and writing support.

This report is made possible through generous support from Applied Intuition.

Please consult the PDF for footnotes.