From Data to Insight: Making Sense out of Data Collected in the Gray Zone
As the United States attempts to identify adversary activity in the gray zone, the assumption is that more data will provide deeper insight and better indications and warning. However, a more likely outcome is that, rather than showing the contours of the forest, expanding the variety of data collected will leave analysts wandering amidst trees of siloed databases. Different sensors create different data streams and different sharing restrictions, forcing analysts to spend time fighting bureaucratic restrictions rather than creating actionable intelligence. The seeds of intelligence failures are planted in such scenarios—policymakers assume that, since we have exquisite sensors and copious data, detection is guaranteed and strategic surprise is impossible. In reality, the vast majority of collected information is left on the cutting room floor.
Operationalizing myriad data sets will become even more challenging as the United States brings new sensing capabilities online, including open source intelligence (OSINT), and expands collection in the cyber domain and at geographical fringes like the Arctic. Rather than allow this data to segregate into siloed databases, this commentary recommends that the U.S. government use its purchasing power to consistently demand interoperability. It also recommends that the intelligence and defense communities lean into unclassified data sets as they seek to move from manpower-intensive sharing mechanisms to a fully integrated, cloud-based, data flow future, such as is envisioned in Joint All-Domain Command and Control. Finally, it maps out necessary contributions from Congress, the executive branch, and industry to achieve an interoperable future.
Finding Gray Zone Activity Requires a Wide Scope of Capabilities
Adversaries like Russia and China are engaging in activities short of war worldwide, capitalizing on gaps in U.S. agencies’ authorities, international blind spots, and seams between agencies and departments. Competition in this gray zone will be global, requiring the United States to collect information globally with a variety of platforms. Stealth and denial and deception efforts are features of gray zone competition, so indications and warning of adversary activity are more likely to appear as subtle oddities rather than clear data points. Finally, adversaries hide behind U.S. laws and authorities designed to protect Americans. For example, Russians operating on social media have posed as Americans protected by first amendment rights to political speech. Similarly, adversaries conducting cyber espionage or cyberattacks hop from foreign networks to infrastructure in the United States to blind foreign-facing intelligence services to their activity. To create a comprehensive picture of this variety of adversary activity, the United States will need surveillance spanning from optical collection of Africa’s littoral regions to automated alerts of anomalous events on domestic network infrastructure.
Much of this data is emerging from commercially available sources. Capabilities such as on-orbit electro-optical and synthetic aperture radar (SAR) sensing are rapidly expanding in the commercial market space. The global SAR market is expected to expand at a compound annual growth rate of 8.5 percent through 2026 to reach $8.5 billion. Combined with other open sources and existing national technical means, the resulting data could increase the U.S. sensing capability by orders of magnitude from the traditional government-only systems. Interoperability, however, is critical. The U.S. government should incentivize commercial sensing providers to ensure their data is not locked into proprietary systems and can be combined with other data sets and sensing capabilities in common cloud environments.
Managing the Flood of Data
Better data alone is not enough to provide actionable intelligence. New streams must be sorted, processed, identified as critical, and efficiently shared to provide timely indications and warning in an era of gray zone competition. As the U.S. intelligence community (IC) takes in massive quantities of new data in a variety of formats, it can choose two paths. The easiest, cheapest route to gaining a new capability is often to accept a complete package offered by the manufacturer, including tailored software for processing and storing the data. But pushing new data into silos is a disservice to the mission of providing actionable intelligence and warning.
Alternatively, in a highly integrated and interoperable future, analysts would be able to draw conclusions from multiple combined data streams by assembling clusters of data points in varying patterns to form new insights. New data could be seamlessly integrated in a vast data lake of unclassified and low-classification collection, which artificial intelligence (AI) and machine learning (ML) applications can scrape to identify trends and highlight insights to the human operators. This vision can be thought of as passing a magnet through a fluid data environment, with clusters of data points appearing, then seamlessly dispersing and reassembling to form new insights when tested with a different magnet. This type of capability could flag changes in the pattern of life around a building known to house disinformation operations, suggesting a shift to a target in a new time zone, along with a hiring blitz by a private military corporation. The two flags could warn of gray zone activity in a new location.
A leading effort to operationalize this vision is Global Information Dominance Experiments (GIDE), run by NORTHCOM. Three GIDE exercises demonstrated that the U.S. military could use cloud-based architecture and AI/ML to create an integrated common operating picture for multiple parties in a simulated fight. GIDE broke through the multitude of siloed data sets using open architecture software to create the final picture by combining sensor data with open data to gain insights across multiple domains simultaneously.
General VanHerck, who led GIDE, testified in June to the U.S. Senate Committee on Armed Services on the operational power of this ecosystem, saying, “By ingesting data streams into cloud-based architecture, where the power of artificial intelligence and machine learning is unleashed, we can drastically reduce processing time across the globe and rapidly enable information dominance and decision superiority.” He then elaborated in a recent event with Dr. Tom Karako that this approach creates “capabilities to collaborate real time or near-real time across all domains . . . to create that decision space, a single pane of glass, if you will, that gives you domain awareness.”
The first GIDE exercise took place in December 2020 with four combatant commands and proved that AI/ML can help process data in real time to create space for decisionmaking. The second round, in March, incorporated all 11 combatant commands, whose reaction was, “Why don’t we have this now?” The third run took place in July and included allies and partners and expanded on the previous GIDEs to include contested logistics. Gen. VanHerck said that “what we saw is having the data and the ability to collaborate globally across all combatant commands in real time is invaluable.” He went on to describe his vision for the future of this capability:
Imagine all the combatant-command J-2s, which are the intelligence folks, being able to sit down and assess all-domain information in real time and come up with an assessment of what’s going on by any competitor or potential adversary. And imagine they can hand that assessment to the J-3s, now the operations folks, to create options, dilemmas, global dilemmas, de-escalation options. And they can collaborate in near real time across a single pane of glass, all seeing the same picture. And imagine they could hand that off to the logistics experts—the J-4s, if you will—who can assess: [are those] feasible options or are they able to be executed? Is the fuel in the right place? Are the weapons in the right place? Are the platforms in the right place? And then assess, can we execute it? That’s what we did in GIDE 3. That capability exists today. That’s what global integration truly is. It’s an assessment of global risk, a look at global resources, the ability to collaborate in near real time across all domains and all combatant commands.
Today, analysts and operators “hand jam” data into spreadsheets and give verbal updates across operations centers, which is slow but has been sufficient against a low-tech, slow-moving adversary. Military colleagues joke ruefully about the impossibility of holding a Microsoft Teams meeting with participants from both the .mil and .gov domains. The Air Force–led operations center tasked with the logistics of receiving Afghan refugees and moving them to safe havens had a stand-down every four hours to review disparate spreadsheets to ensure numbers matched. In the context of a great power conflict that begins with gray zone warfare, yelling across operations centers and printing spreadsheets will impede U.S. efforts with potentially disastrous effects. While analysts are assembling data and attempting to communicate, an adversary could be in the late stages of conducting a cyberattack that severs communications with far-flung forces, preventing a response. A complete data picture for early, effective warning will be critical.
The Intersection of Data and People
Succeeding in this free-flowing data environment will take both data science professionals and a workforce that can be their own engines of innovation. Just as after 9/11 the available pool of experts on terrorism was spread dangerously thin, today the United States is making unreasonable demands on its data workforce. Staffing up will take time and innovative approaches to attracting talent. Meanwhile, adding digital literacy to the onboarding of new employees will help seed talent who can continue to innovate in the workforce.
Two models of cultivating digital talent in the military illustrate divergent approaches to staffing up in this area. The U.S. Army’s approach is to train specialists, curating a smaller cadre of highly skilled professionals. To widen the pool of talent, soldiers could be allowed to take advantage of the partnership between the AI Task Force and Carnegie Mellon, as well as the Pentagon’s Training with Industry program, to take free or low-cost courses. On the other hand, the U.S. Air Force is creating foundational competency for all airmen through commercial training. MITRE suggested modifying ROTC into ROTC+ to create an AI talent pipeline and awarding AI-related scholarships.
On the civilian side, the Senate in June passed the U.S. Innovation and Competition Act of 2021, which provides for programs to attract and retain employees from STEM fields and to “reskill” existing employees. Other ideas include creating central accountability points for recruiting computer science talent—for example, “chief digital recruiting officers” at the Department of Defense (DOD), Department of Energy (DOE), and in the IC. Incentivizing new AI and digital talent to join out of universities should be another focus of civilian efforts, for example with tuition reimbursement or loan forgiveness for those who serve in government. Finally, the National Security Commission on Artificial Intelligence (NCSAI) made six recommendations in Chapter Six of its final report alone on bolstering the civilian tech workforce, including creating both a National Reserve Digital Corps and a United States Digital Service Academy.
As the United States ramps up collection on its adversaries’ activities in the gray zone, its departments and agencies should insist on an ability to collect, process, and share data at scale. The following recommendations will help create an interoperable, fluid data environment that a tech-savvy workforce can use to further a national security mission:
- Demand broad authorities to share data. Contracts are often structured around the number of licenses available to use a particular software system, including those designed to process and exploit data. The government should use its purchasing power to shift those contracts to open use. Further, executive branch agencies should build extra funds into budget requests—and Congress should support those requests—for the necessary but painstaking work of transitioning legacy data holdings into interoperable formats. Deputy Secretary of Defense Kathleen Hicks is already moving this direction: in her May memo to the Department, she said DOD will “maximize data sharing and rights for data use: all DOD data is an enterprise resource.”
- Purchase flexible infrastructure. The U.S. government spent an estimated $6.6 billion on cloud capabilities in fiscal year 2020, with defense agencies responsible for $2.1 billion of that spending. Spending is likely to rise 9 to 10 percent a year, by one estimate. The recent, public flak over the Jedi contract drew considerable attention, but Congress should be engaging in intensive oversight of those purchases to ensure that agencies’ perceived needs for particular requirements are not preventing interoperability.
- Write flexibility into contracts. Contracts for software and integration services should list end-state capabilities, not specific requirements, and should allow for iteration. The ecosystem of data and exploitation tools will grow organically, and restrictive contracts should not prevent the U.S. government from pivoting to the latest capabilities.
- Sprint on hiring tech talent. Despite the expense, pulling in strong technical talent and those willing to learn will pay dividends in the long run. The U.S. government should partner senior tech talent with senior mission managers to construct flexible guidelines for the hiring pipeline. Lessons learned from the post-9/11 hiring surge should inform this new cycle of hiring.
Recommendations for Specific Stakeholders
Congress, the executive branch, and industry each need to play a role to create a comprehensive set of rules, guidelines, and technology acquisition that will make up a healthy data ecosystem. The following chart breaks down six key factors. Each factor receives a red, yellow, or green rating, based on the following definitions:
Green: Mechanisms in place, along with funding, with at least initial operating capability.
Yellow: Pilot programs active or programs mandated, if not yet operable, even if implementation is mixed.
Red: Little to no forward progress on a program.
Emily Harding is deputy director and senior fellow with the International Security Program (ISP) at the Center for Strategic and International Studies (CSIS) in Washington, D.C. McKenzie Richardson is an intern with ISP at CSIS. Col. Matt Strohmeyer is an Air Force military fellow with ISP at CSIS.
This commentary is made possible by support from General Atomics and general support to CSIS.
Commentary is produced by the Center for Strategic and International Studies (CSIS), a private, tax-exempt institution focusing on international public policy issues. Its research is nonpartisan and nonproprietary. CSIS does not take specific policy positions. Accordingly, all views, positions, and conclusions expressed in this publication should be understood to be solely those of the author(s).
© 2021 by the Center for Strategic and International Studies. All rights reserved.