Blackout Scorecard
Everyone has been in an electrical power blackout. Blackouts are a good way to think about what happened on July 19, when a widespread network outage disrupted businesses operations worldwide. The chief difference is that blackouts tend to be localized, while this was global in its reach. This was also not a total blackout, as is usually the case with electricity. Microsoft estimates that 8.5 million computers were affected out of a global total of well over a billion. The outage highlighted key problems for the globally connected digital networks that businesses depend upon. Most companies know how reliant they are on computers and networks, but they may not have realized the degree of interconnections and interdependencies they have with third parties. The world’s networks are connected in unexpected ways, and this creates unpredictable risk and vulnerability.
Cybersecurity
Initially, there was some concern that the July 19 online blackout was a cyberattack. One precedent for this incident is the SolarWinds hack, which was a cyberattack conducted for intelligence purposes. One commentator went as far as to suggest that “there would be deaths” as a result of the most recent attack. There have been no deaths, and the media outlets that ran that line should have put an emoji appropriate for disbelief next to it. Other comments said that this was a foretaste of what a cyberattack could look like. This is partially true, except a cyberattack will be directed at targets of strategic value, not a random collection of devices, and will be continuous and recurring rather than a single event.
Cost
A little perspective helps. The IMF estimates that the global economy will reach $110 trillion in 2024. This means that a single event costing even $100 billion is only 0.1 percent of world income. While outages are annoying for consumers and potentially very expensive for individual companies, this is basically a rounding error. The issue is how to ensure these outages do not become routine events, where the effect of a mistake by a big software provider leads to widespread disruption. This event will change how contracts and insurance policies are written, and much will depend on how courts ultimately assign liability. Given the increasing pace of digitization and the attention criminals and hostile states are giving to compromising third-party software providers, the chance of similar incidents cannot be discounted.
Resiliency
Many national cybersecurity strategies highlight the need for resilience, but apparently, there has not been enough progress. The contours of the incident suggest how to build resilience. The coding error that caused the blackout did not affect all devices or even most devices. Millions were affected, but many millions more were not. Systems running Linux software or using Macs were unaffected. China, which bans Western software and cloud services, was unaffected. This means that the event was to some extent avoidable. To continue the electric power analogy, critical facilities like hospitals or airports have installed generators to provide temporary electrical power in the event of a blackout. Despite the attention given to resilience, the victims of this most recent network outage had not done that for their computer networks. This is a business decision—redundant networks are expensive to install and maintain. However, so are generators, and one lesson from this incident is that critical facilities must build redundancy with networks running different software and with different configurations for them to achieve digital resilience. The business decision to not spend on redundancy is, essentially, a decision to accept the risk of a network outage. Thinking about risk has not caught up to the increased dependencies involved in building and running a network.
Liability
Decisions to accept risk point to another key consideration: liability. One key effort of the Biden administration has been to shift liability from consumers to producers. In this incident, assigning liability will be contentious and dependent on how contracts were written. One journalist, speaking to the author, quipped that this incident would serve as the "lawyer full employment act of 2024," as victims will naturally seek to sue their service providers (though their contracts will likely preclude that). Another lesson from this event (and from the federal effort to improve cloud security) is that attention to service contracts will become even more important—are contingencies spelled out, is liability assigned, and are there escape clauses? Scrutiny of contract language is central to debates on improving cloud security, and resiliency will become part of this.
It is also worth watching the market share of affected companies, which will determine revenue and stock prices. Both cybersecurity and cloud services are highly competitive markets, and in this case, a competitor’s market pitch practically writes itself. This may not be fair, as the utility of a product is not determined by one errant update, but it may be unavoidable. Anyone who has worked with software knows these kinds of glitches are inevitable, but the degree of our dependency on these fragile interconnected networks may not have been fully appreciated.
Monoculture and Regulation
Monoculture is a problem for cybersecurity and digital robustness, but it is not a new issue. Monoculture refers to a reliance on a single company’s operating systems or services. This problem first appeared in the late 1990s, as the result of market concentration and the hierarchical provision of digital technologies. The solution then was to require standards that allow a degree of interoperability (at least for software applications connecting to the dominant operating system). This is such a technically complex subject that legislatures and agencies cannot undertake it (one shudders at the thought of EU regulation), but they can incentivize the private sector to move. Something similar will be required now, and there will be renewed attention around the world to anti-trust regulation, tech competition, and cloud services standards.
Critical Infrastructure
Last week’s failure ties into the debate on whether cloud services have become a critical infrastructure and should be regulated accordingly. The answer after this is clearly yes, but designing effective regulations will be hard. There is already a complex effort underway to improve the security of cloud services, but it will need to be extended to address resiliency. Moving to on-premises solutions (where a company manages its cloud internally rather than outsourcing it) would not have helped companies that were customers of the security software provider. Another lesson is to look at the companies that were not affected and generalize from their experience and practices to help create standards for resilience. An entire, largely invisible, infrastructure based on digital connectivity and cloud computing now underpins economic activity.
Next Steps
Some of the steps to address the network outage will occur naturally—lawsuits are unavoidable, and market share will respond to any changes in customer preferences. Others that require resources, like building redundant systems using different software, cost money, and some companies may choose to take the risk again rather than pay. Government incentives (at least for public services like healthcare and transport) can change this. The longer process of developing the standards for interoperability that can dilute monocultures will take time, effort, and expertise. The best parallel remains financial networks, which are globally interconnected and not entirely subject to national control. As with finance, cooperation will be necessary if there is to be stability and greater assurance, but this is not on the international agenda and will need to be added. The world will not be growing less interconnected.
James A. Lewis is senior vice president and director of the Strategic Technologies Program at the Center for Strategic and International Studies in Washington, D.C.