Protecting Data Privacy as a Baseline for Responsible AI
Artificial intelligence (AI) has already impacted all industries and sectors, but the United States still lacks uniform nationwide rules on how all companies process personal information for algorithmic development and deployment. Congress has considered updating U.S. commercial data privacy standards with varying degrees of momentum for over a decade now, with increased activity around the Obama administration’s 2012 Consumer Privacy Bill of Rights and the 2018 Cambridge Analytica revelations. Now, the recent acceleration of AI development should also similarly spur discussion around privacy safeguards, particularly as major technology companies amend their user agreements to explicitly permit the scraping of personal information to train algorithms. Meanwhile, the European Union has already enacted sweeping laws that implicate data governance, including the General Data Protection Regulation (GDPR), Digital Services Act (DSA), and Artificial Intelligence Act (AI Act). As U.S. policymakers consider new rules to allow individuals greater agency over the processing of their personal information and to safeguard the public from the risks of AI-enhanced surveillance, understanding the European Union’s approach could both promote regulatory consistency for businesses as well as reaffirm shared transatlantic values on privacy and digital rights.
Q1: How does the development and deployment of AI create privacy risks?
A1: Algorithms require astronomical amounts of data to train. For example, some estimates place the growth of ChatGPT’s training dataset from 1.5 billion parameters in 2019 to 175 billion parameters in 2020—and this number has inevitably increased since then. In general, developers can aggregate training datasets from numerous sources, both public and private, including websites, news articles, online search histories, smartphone geolocation, consumer transactions, and more. Because the sheer scope of the datasets required to train algorithms creates an enormous demand for personal and nonpersonal information, technology platforms are facing larger incentives to collect, share, and store precise datasets for lengthier periods of time.
Going further, algorithms search for patterns within these data points, predicting causal links and drawing inferences between them. In this manner, algorithms can profile individuals based on otherwise unconnected data points, thus revealing private information in their outcomes. For example, a data broker could infer private details about a person’s income, religion, relationship status, or political affiliation based on analyzing their shopping history, internet browsing activity, and precise geolocation. Even if the datasets are anonymized, algorithms can sometimes deduce the identities of specific individuals by combining multiple sources or tracking a single data point over an extended period. For instance, by monitoring mobile location data over months, one predictive analytics company has traced patterns of life that could identify specific people from otherwise anonymous devices.
In addition to the potential to unveil details about a person’s life without their knowledge or consent, algorithmic privacy violations come with concrete economic, security, and reputational harms. For example, AI could facilitate targeted phishing attacks or other scams based on personal information, allowing bad actors to impersonate victims using synthetic media or tailor deceptive messaging to specific people. In addition, companies could use AI to charge consumers different prices for products or services based on their predicted needs, interests, or risks. For example, until March, General Motors had sold information about their customers’ trip lengths, speed, and other driving habits to data brokers, which had factored into the cost of their insurance premiums. Any errors, false assumptions, or biases in the training data could result in disparate impact on a mass scale—particularly during high-impact contexts like credit scores or loan decisions in the private sector, and counterterrorism analysis or benefits eligibility in the public sector. AI expands the reach of existing surveillance practices—introducing new capabilities like biometric identification and predictive social media analytics—which could also disproportionately affect the privacy of communities that have historically been subject to enhanced policing based on factors like their zip code, income, race, country of origin, or religion.
Q2: What major policy actions has the United States taken to address privacy risks associated with AI?
A2: In October 2023, the White House issued an Executive Order on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, one of the first binding actions for federal agencies specifically tailored to AI. The executive order (EO) directs the Office of Management and Budget (OMB) to assess the federal procurement, use, and transfers of commercially available information for purposes outside national security and to consider guidance on mitigating any associated risks to privacy. It also tasks OMB to seek stakeholder input on updating guidance for federal agencies to conduct privacy impact assessments to mitigate risks posed by AI. In addition, the EO directs federal agencies to use privacy-enhancing technologies (PETs) to protect personal information, recognizing that AI “is making it easier to extract, re-identify, link, infer, and act on sensitive information about people’s identities, locations, habits, and desires.” The National Institute of Standards and Technology (NIST) has also published a draft voluntary AI Risk Management Framework that encourages the development of PETs which might include de-identification, differential privacy, or federated learning.
In addition to the EO, the White House released a nonbinding Blueprint for an AI Bill of Rights in 2022 with five principles to govern the development and deployment of AI in both the public and private sectors. One of these five pillars is data privacy. Among other guardrails, the blueprint calls for AI developers to incorporate data minimization into their products by design, stating that data collection should mirror “reasonable expectations and that only data strictly necessary for the specific context is collected.” In addition, it recognizes the role of individual rights and user control over the collection, processing, transfer, and deletion of personal data—which aligns with legal principles within the GDPR. Although it respects the role of users to provide consent to data processing, it also states that consent must be “appropriately and meaningfully given” and consent requests must be “brief” and “understandable in plain language.” Finally, the blueprint calls for stronger privacy protections in high-risk contexts—including criminal justice and employment—and states that “surveillance technologies should be subject to heightened oversight that includes at least pre-deployment assessment of their potential harms and scope limits to protect privacy and civil liberties.”
While the AI blueprint puts forward principles for a commercial privacy framework, Congress would need to enact legislation to incorporate mandatory requirements for companies nationwide. In May, a bipartisan Senate working group released a road map for AI governance that supports the ongoing development of federal commercial privacy legislation. In addition to protecting privacy, the Senate road map acknowledged that a comprehensive privacy law could increase regulatory certainty for AI developers and draw a contrast to authoritarian governments that have expanded their surveillance regimes as a form of repression. However, it also recognized that commercial privacy standards must balance the need to support cross-border data flows for digital commerce, communications, and innovation. In the past few legislative sessions, Congress has introduced multiple bills or draft proposals that aim to mandate companies to minimize their data collection and usage, allow users to opt out of automated decisions in significant contexts, and assess the impacts of their algorithms. However, in late June, the House Energy and Commerce Committee canceled a scheduled markup of its most recent proposal, the American Privacy Rights Act, after also removing provisions designed to prevent data-driven discrimination and allow individuals to opt out of “consequential” AI-enabled decisions by private companies. In the absence of federal legislation, multiple state and local governments have already enacted laws to regulate facial recognition, mitigate algorithmic bias in hiring, and allow opt-outs for automated profiling—but these are not consistent across the entire nation.
Q3: What major policy actions has the European Union taken to address privacy risks associated with AI?
A3: On July 12, the European Union’s AI Act officially became law following signature by the presidents of the European Parliament and Council and publication in the Official Journal of the European Union. The law classifies algorithmic systems based on their level of risk, banning the most harmful—classified as an “unacceptable” risk—such as predictive policing and emotion recognition systems in employment or educational contexts. Under this risk-based framework, the AI Act prohibits law enforcement authorities from using real-time remote biometric systems to identify people in public places, with limited exceptions to aid searches for missing persons or actions against terrorism. In contrast, systems that are classified as “high” but not “unacceptable” risk—such as automated systems in migration and border control—are conditionally permitted but with stricter obligations to ensure human oversight, the accuracy and quality of datasets, and robust cybersecurity measures. Finally, general-purpose uses of AI, which fall below the thresholds of unacceptable or high risk, come with basic transparency and quality requirements such as publicly explaining the data used to train the models and mitigating systemic risks to privacy and fairness.
Article 22 of the GDPR allows individuals to opt out of automated decisionmaking or predictive profiling that “produces legal effects concerning him or her or similarly significantly affects him or her,” which could include eligibility for public benefits, job applications, and credit decisions. In addition, Articles 13 and 35 of the GDPR require entities to be transparent about their legal purpose for processing data, including to inform automated decisions, and to regularly assess their potential impacts on affiliated individuals. In the past two years, OpenAI has faced lawsuits in several EU member states—including Austria, France, Germany, Italy, Spain, and Poland—under the GDPR, due to concerns including its legitimate basis for processing personal data and potential to generate inaccurate content about specific individuals that could result in privacy or reputational damages. In April 2023, the European Data Protection Board established a task force to harmonize potential enforcement actions against ChatGPT under the GDPR.
Finally, the European Union’s DSA, which came into full effect in February, bans targeted advertising to minors under 18 years old based on their personal information. It also prohibits targeted advertising to all individuals based on sensitive characteristics such as their sexual orientation, political affiliation, and religion. In this manner, the European Union has identified and attempted to mitigate the potential privacy risks associated with AI development and deployment through regulation. While it still permits public and private entities to use personal information to develop AI in general or lower-risk contexts, they must have a legitimate basis to do so and comply with other regulatory requirements including individual rights, transparency, and data quality standards under the GDPR, DSA, AI Act, and other measures.
Q4: Are there opportunities for EU and U.S. regulatory alignment on AI and privacy?
A4: During the sixth meeting of the Trade and Technology Council (TTC) in April, the European Union and the United States emphasized their shared “commitment to a risk-based approach to artificial intelligence” that prioritizes transparency and safety. The TTC also released a joint road map that proposes to develop common AI terminologies, risk benchmarks, and standards in the long term. While these dialogues demonstrate shared values across the Atlantic, each jurisdiction is at a different point in the regulatory process. While the European Union has already enacted the GDPR, AI Act, and DSA, the United States currently relies on voluntary guiding principles like the NIST AI Risk Management Framework and Blueprint for an AI Bill of Rights.
In the absence of the United States establishing binding rules through legislation, many U.S. technology companies have adopted new policies to comply with EU law. It remains to be seen whether the AI Act will result in a “Brussels effect,” a phenomenon in which major EU laws—like the GDPR—have affected international business practices and served as a guidepost for other governments. U.S. technology companies could more easily implement some elements of the AI Act, such as transparency requirements, in regions outside the European Union. Other provisions of the AI Act and GDPR, however, could more significantly restrict how AI developers and users operate—such as requirements for entities to minimize the processing of personal information for legal or legitimate purposes, obtain user consent to handle sensitive data, and allow individuals to opt out of significant automated decisions. The GDPR does not automatically consider AI development as a legitimate purpose to process data, although the AI Act clarifies that entities may use sensitive information to identify and mitigate algorithmic bias.
As both U.S. and EU approaches are generally rooted in similar fair information practice principles, privacy presents a natural opportunity to align transatlantic regulatory approaches. In particular, U.S. federal legislation could prioritize four key components. First, it could define the legal responsibilities of AI developers and users to understand and mitigate both the potential privacy risks as well as the actual impacts of their algorithms. Next, it could require AI developers and users to prioritize transparency into their processing of personal information as well as their algorithmic outcomes. Third, it could consider the purposes and uses in which AI-driven surveillance should categorically be either allowed or prohibited and under what conditions or safeguards. Fourth, it could grant individuals the rights to opt out of automated decisionmaking and request human review when reasonably feasible. In doing so, the United States could reduce compliance uncertainties for both large and small AI developers, increase trust in U.S. technological innovation, and promote safer outcomes of algorithmic processes.
Caitlin Chin-Rothmann is a fellow with the Strategic Technologies Program at the Center for Strategic and International Studies in Washington, D.C.