The Future of Democracy in the Age of Deepfakes 

AI-generated content has already begun to work against us, rather than for us. To ensure this technology brings benefits rather than harms, we must institute immediate changes

An AI-generated image of Donald Trump shaking hands with George Washington.

By most surveys, some 1 in 5 Americans dismiss or deny the effects of global climate change. At the tail end of the global COVID pandemic, 1 in 5 Americans believed the statement “Bill Gates plans to use COVID-19 to implement a mandatory vaccine program with microchips to track people.” And, after Joe Biden won the presidential race in 2020, more than half of Republican voters believed that Donald Trump rightfully won. 

Basic facts regarding our planet’s health, our public health, and the foundations of our democracy are being denied by a significant number of citizens. This prevalent alternate reality is not unique to the United States; this plague of lies, conspiracies, misinformation, and disinformation is global in nature. 

Most of the beliefs in these (and other) baseless claims are being spread through traditional channels and social media, and amplified by famous personalities, influencers, politicians, and some billionaires. What happens when the landscape that has allowed widespread and dangerous conspiracies to take hold is super-charged by generative AI? 

Making Deepfakes 

Before the less-objectionable term “generative AI” took root, AI-generated content was referred to as “deepfakes”, a term derived from the moniker of a 2017 Reddit user who used this nascent technology to create non-consensual intimate imagery, NCII (often referred to by the misnomer “revenge porn”, suggesting somehow that the women depicted inflicted a harm deserving of revenge). 

Today, generative-AI is capable of creating hyper-realistic images, voices, and videos of people saying or doing just about anything. These technologies hold the promise to both revolutionize many industries while also super-charging the spread and belief in dangerous lies and conspiracies. 

Trained on billions of images with an accompanying descriptive caption, text-to-image AI models progressively corrupt each training image until only visual noise remains. The AI model then learns to denoise each image by reversing this corruption. Once trained, this model can be conditioned to generate an image that is semantically consistent with any text prompt, such as “Please generate an image of the great Egyptian Sphinx and pyramids during a snowstorm.” (As a side note, fellow AI researcher Sarah Barrington gave me the advice that you should always say “please” and “thank you” when speaking with AI models so if—or when—they take over the world, they will remember you were nice to them).  

Video deepfakes fall into two broad categories: text-to-video and impersonation. Text-to-video deepfakes are the natural extension of text-to-image models where an AI model is trained to generate a video to be semantically consistent with a text prompt. These models have become significantly more convincing over the past 12 months. A year ago, the systems tasked with creating short video clips from a text prompt like “Will Smith eating spaghetti” yielded obviously fake videos of which nightmares are made.

The videos of today, while not perfect, are stunning in their realism and temporal consistency and are quickly becoming difficult to distinguish from reality; the updated version of Will Smith enjoying a bowl of spaghetti is evidence of this progress. 

Although there are several different incarnations of impersonation deepfakes, two of the most popular are lip-syncs and face-swaps. Given a source video of a person talking and a new audio track (either AI-generated or impersonated), a lip-sync deepfake generates a new video track in which the person’s mouth is automatically modified to be consistent with the new audio track. And, because it is relatively easy to clone a person’s voice from as little as 30 seconds of their voice, lip-sync deepfakes are a common tool used to co-opt the identity of celebrities or politicians to push various scams and disinformation campaigns. 

A face-swap deepfake is a modified video in which one person’s identity, from eyebrows to chin and cheek to cheek, is replaced with another identity. This type of deepfake is most common in the creation of non-consensual intimate imagery. Face-swap deepfakes can also be created in real time, meaning that soon you will not know for sure if the person at the other end of a video call is real or not. 

The trend of the past few years has been that all forms of image, audio, and video deepfakes continue their ballistic trajectory in terms of realism, ease of use, accessibility, and weaponization. 

Weaponizing Deepfakes in the 2024 U.S. Election 

It is difficult to quantify the extent to which deepfakes impacted the outcome of the 2024 U.S. presidential elections. There is no question, however, that deepfakes were present in many different forms, and—regardless of their impact—their use in this recent election is a warning for future elections around the world. 

The use of deepfakes in the election ranged from outright attempts at voter suppression to disinformation campaigns designed to confuse voters or cast doubt on the eventual outcome of the election. 

In January 2024, for example, an estimated tens of thousands of Democratic party voters received a robocall in the voice of President Biden instructing them not to vote in the upcoming New Hampshire state primaries. The voice was AI-generated. The perpetrators of this attempted election interference were Steven Kramer (a political consultant), Paul Carpenter (a magician and hypnotist who was paid $150 to create the fake audio), and a telecommunications company called Lingo Telecom. Carpenter used ElevenLabs, a platform offering instant voice cloning for as little as $5 a month. 

Throughout the campaign, it was common to see viral AI-generated images of black people embracing and supporting Donald Trump chalking up millions of views on social media. Cliff Albright, the co-founder of Black Voters Matter, a group encouraging black people to vote, said the manipulated images were pushing a “strategic narrative” designed to show Trump as popular in the black community. “There have been documented attempts to target disinformation to black communities again, especially younger black voters,” Albright said

In an attempt to presumably cast doubt over the fairness in the election, countless fake videos—linked back to Russia—circulated online purporting to show an election official destroying ballots marked for Trump. An endless stream of viral AI-generated images and videos polluted social media; these ranged from fake images pushing the Kamala Harris is a socialist/communist narrative to a fake image of Taylor Swift endorsing Donald Trump. 

While the threats from deepfakes are already with us, perhaps the more pernicious result of deepfakes is that when we enter a world where anything we see or hear can be fake, then nothing has to be real. In the era of deepfakes, a liar is equipped with a double-fisted weapon of both spreading lies and, using the specter of deepfakes, casting doubt on the veracity of any inconvenient truths—the so-called liar’s dividend

Trump, for example, publicly accused the Harris-Walz campaign of posting AI-generated images of large rally crowds. This claim was baseless. It could be argued that denying crowd size is simple pettiness, but there could also be something more nefarious at play. Trump publicly stated that he would deny the results of the election if he lost, so denying large crowd sizes prior to the election would give him ammunition to claim voter fraud after the election. As the violent January 6 insurrection from the 2020 election showed us, the stakes for our democracy are quite high. As deepfakes continue to improve in realism and sophistication, it will become increasingly easy to wield the liar’s dividend. 

Figure 2: An authentic photo of a Harris-Walz rally that, during the 2024 U.S. national election, Donald Trump claimed was fake. 

Protecting Democracy from Deepfakes 

If we have learned anything from the past two decades of the technology revolution (and the disastrous outcomes regarding invasions of privacy and toxic social media), it is that things will not end well if we ignore or downplay the malicious uses of generative AI and deepfakes. 

I contend that reasonable and proportional interventions from creation through distribution, and across academia, government, and the private sector are both necessary and beneficial in the long-term for everyone. I will enumerate a range of interventions that are both practical and, when deployed properly, can keep us safe and allow for innovation to flourish. 

Creation. There are three main phases in the life cycle of online content: creation, distribution, and consumption. The Coalition for Content Provenance and Authentication (C2PA) is a multi-stake holder, open-source initiative aimed at establishing trust in digital audio, image, and video. The C2PA has created standards to ensure the authenticity and provenance of digital content at the point of recording or creation. This standard includes the addition of metadata and embedding an imperceptible watermark into content, and extracting a distinct digital signature from content that can be used for identification even if the attached credentials are stripped out. All AI services should be required to implement this standard to make it easier to identify content as AI-generated. 

Distribution. Social media needs to take more responsibility for its role in sharing content, from the unlawful to the lawful-but-awful items that are shared on their platforms and amplified by their own recommendation algorithms. However, while it is easy to single out social media platforms for their failure to rein in the worst abuses on their platforms, they are not uniquely culpable. Social media operates within a larger online ecosystem powered by advertisers, financial services, and hosting/network services. Each of these—often hidden—institutions must also take responsibility for how their services are enabling a plethora of online harms. 

Consumption. When discussing deepfakes, the most common question I’m asked is: “What can the average consumer do to distinguish the real from the fake?” My answer is always the same: “Very little”. After which I explain that artifacts in today’s deepfakes—seven fingers, incoherent text, mismatched earrings—will be gone tomorrow, and my instructions will have provided the consumer with a false sense of security. The space of generative AI is moving too fast and the forensic examination of an image is too complex to empower the average consumer to be an armchair detective. Instead, we require a massive investment in primary and secondary education to empower consumers with the necessary skills to understand how and from where to obtain reliable news and information. 

Authentication. The process of identifying manipulated content by qualified experts is partitioned into two broad categories: active and reactive. Active approaches include the type of C2PA content credentials described above, while reactive techniques operate in the absence of such credentials. Within the reactive category, there are a multitude of techniques for detecting manipulated or AI-generated content. Collectively, these techniques can be effective, but a major limitation is that by the time malicious content is uploaded online, flagged as suspicious, analyzed for authenticity, and a fact check is posted, the content can easily have racked up millions of views. This means that this type of authentication is appropriate for post-mortems, but does not address the billions of daily uploads. 

Legislation. To date, only a handful of nations and a few U.S. states have moved to mitigate the harms from deepfakes. While I applaud individual U.S. states for their efforts, internet regulation cannot be effective with a patchwork of local laws. A coordinated national and international effort is required. In this regard, the European Union’s Digital Safety Act, the United Kingdom’s Online Safety Act, and Australia’s Online Safety Act provide a road map for the United States. While regulation at a global scale will not be easy, some common ground can surely be found among the United States and its allies, thus serving as a template for other nations to customize and adopt. 

Academe. In the 1993 blockbuster movie Jurassic Park, Jeff Goldblum’s character Dr. Ian Malcom criticized the reckless use of technological advancements in the absence of ethical considerations by stating: “Your scientists were so preoccupied with whether they could, they didn’t stop to think if they should.” I am, of course, not equating advances in AI with the fictional resurrection of dinosaurs some 66 million years after extinction. The spirit of Goldblum’s sentiment, however, is one all scientists should absorb. 

Many of today’s generative-AI systems used to create harmful content are derived from academic research. For example, researchers at the University of California, Berkeley  developed the program pix2pix, which transforms the appearance or features of an image (e.g., transforming a day-time scene into a night-time scene). Shortly after its release, this open-source software was used to create DeepNude, a software that transforms an image of a clothed woman into an image of her unclothed. The creators of pix2pix could and should have foreseen this weaponization of their technology and developed and deployed their software with more care. This was not the first case of such abuse, nor will it be the last. From inception to creation and deployment, researchers need to give more thought on how to develop technologies safely and, in some cases, whether the technology should be created in the first place. 

There is much to be excited about in this latest wave of the technology revolution. But if the past few technology waves have taught us anything, it is that left unchecked, technology will begin to work against us and our democracy. We need not make the mistakes of the past. We are nearing a fork in the road for what role technology will play in the type of future we want. If we maintain the status quo, technology will continue to be weaponized against individuals, societies, and democracies. A change in corporate accountability, regulation, liability, and education, however, can yield a world in which technology and AI works with and for us. 

Famed actor and filmmaker Jordan Peele’s 2018 public service announcement on the dangers of fake news and the then-nascent field of deepfakes offers a word of relevant advice. The PSA concludes with a Peele-controlled President Obama stating: “How we move forward in the age of information is gonna be the difference between whether we survive or whether we become some kind of f***ed up dystopia.” I couldn’t agree more. 

The Cairo Review of Global Affairs
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.