Will Machine Learning Supercharge Online Disinformation?

Posted: September 2nd, 2020

By Matthew O’Shaughnessy

Matthew O’Shaughnessy is a Ph.D. student in the School of Electrical & Computer Engineering at Georgia Tech and a 2019-2020 Sam Nunn Security Program Fellow

This Academic Incubator column is part of a partnership with the Nunn School of International Affairs at the Georgia Institute of Technology, presenting regular expert commentary on global security-related issues by faculty, fellows, and students.

This piece was submitted by Matthew O’Shaughnessy, a Ph.D. student in the School of Electrical & Computer Engineering at Georgia Tech and a 2019-2020 Sam Nunn Security Program Fellow.

Once heralded as vehicles for promoting democratic values abroad, social media platforms now serve as vectors for homegrown and foreign disinformation. By dictating the information consumed by hundreds of millions of Americans, the machine learning (ML) algorithms employed by these platforms are an integral part of the spread of disinformation. Moreover, by improving and automating the generation and targeting of disinformation, emerging ML capabilities have the potential to significantly enhance the effectiveness of disinformation campaigns.

This brief summarizes the results of a recent analysis that critically evaluates how ML tools could affect the creation, spread, and effectiveness of disinformation. We focus on three capabilities that are either available currently or have the potential to be developed within several years.

Capability 1: Precision targeting of individuals. Effective disinformation exploits weaknesses in the methods humans use to synthesize complex and contradictory information. Psychological research has shown that these so-called heuristics cause individuals to ascribe undue credibility to messages that are simple, negative, and repeated, and to subconsciously prefer information that conforms to existing beliefs or reinforces identity-based group membership. For propaganda campaigns to exploit these heuristics, however, their content must be precisely targeted, directing the “right” messages to the “right” people.

Machine learning tools are ideally suited to perform this targeting. Using preference learning algorithms originally developed for personalized advertising, authors of disinformation could infer traits such as ideological leaning, personal biases, and proclivity for conspiratorial thinking from activity on social media, web browsing habits, or location history. In turn, these inferred traits allow disinformation campaigns to predict which messages particular individuals will be receptive to. In fact, social media platforms automatically perform similar types of inference for advertisers, potentially making these capabilities available to a broad range of actors at a low financial and technical cost.

We should expect these targeting capabilities to improve as the complexity of ML algorithms and the amount of personal data available online increases. However, social media companies can limit the effectiveness of ML-based disinformation targeting. To perform effective targeting, ML algorithms require fine-grained data about individual users — data that most social media platforms do not provide directly to advertisers. Platforms can deprive disinformation campaigns of pre-built targeting tools by refusing to sell advertising products to political or foreign actors, and they can prevent nefarious actors from building their own targeting capabilities by providing only coarse and aggregated metrics of user interaction to advertisers.

Capability 2: Automatically generated propaganda. Today, most disinformation is human-generated, with groups such as Russia’s Internet Research Agency and China’s “50-cent army” manually creating and disseminating disinformation based on loose guidelines. These types of operations require a large workforce knowledgeable about foreign language and culture, limiting the potential scope of personalized disinformation.

Emerging ML capabilities could overcome this limitation by automating the creation of disinformation. Recent advances in natural language processing have developed powerful systems capable of generating fabricated news articles that humans have difficulty distinguishing from real ones. Current text generation models, however, have significant technical limitations that mitigate their usefulness for creating disinformation. The supervised learning frameworks used by these models do not “learn” in the same way humans do: while they can generate content that looks human-generated, their inability to understand underlying concepts in text make them less like a conversant adult and more like a child who blurts out a relevant phrase they overheard the previous day. Thus, while the most advanced language generation systems available today can generate high-quality fabricated content broadly relevant to a given topic, they cannot yet reliably generate cogent text that could achieve a specific propaganda aim. Further, since many modern ML techniques used for text generation are similar to the techniques used to detect generated text, advancements in fabricated disinformation generation are likely to be accompanied by similar advancements in its detection.

It is unlikely that a disinformation generation system that overcomes these limitations will be developed without warning or by an unexpected adversary. Though these capabilities could be proliferated quite easily once they are created, their initial development and training requires enormous computational resources and technical expertise in a broad range of areas. Moreover, the rapid pace of the ML research community coupled with its open culture makes it unlikely that major new capabilities will be generated.

Many of the tools best suited for generating computational propaganda are similar to those used to generate other types of convincing fabricated media (e.g., so-called “deepfakes”). We should expect the quality of all types of fabricated media to continue to improve, and because these tools require relatively little technical expertise to use once they are developed, we should also expect rapid dissemination of new capabilities developed by the research community.

These ML tools have the potential to make specific disinformation campaigns more effective. But perhaps most worrying is their ability to undermine faith in the existence of a single objective truth — a key aspect of the Russian approach to domestic information operations. This type of confusion makes it easier for individuals to justify incorrect beliefs in the presence of countervailing evidence, making them even more vulnerable to disinformation and undermining the democratic exchange of ideas.

Capability 3: Selective exposure and personalized content. On many platforms, ML algorithms dictate what information is presented to users. Algorithms deployed to maximize user engagement often do so by selecting content that has many of the traits of effective disinformation: simplistic and negative, appealing to emotion over fact, and matching pre-existing beliefs. Even in the absence of automatic content selection, platforms that encourage interaction with exclusively like-minded users can diminish societal shared truth and create fertile ground for disinformation campaigns.

Protecting against these effects requires the careful design of online platforms, a task complicated by technical challenges and misaligned economic incentives. For example, limiting the scope of ML-based content selection algorithms would reduce platforms’ advertising revenue and deprive users of individually tailored content. Increased cooperation with researchers would put social media companies at risk of negative publicity or government regulation. And the black-box nature of many ML algorithms used for content selection makes it difficult for even companies themselves to understand how their products could affect users.

How can we protect ourselves? As the capabilities outlined above bolster the reach and influence of disinformation, it is critical that we take steps to ensure that the impact of the flood of information we interact with online will have positive implications for democracy.

Internet and social media platforms (and their regulators) must develop the institutional norms and ethical structures befitting critical democratic infrastructure. Platforms should provide transparency to users and researchers to help individuals understand who is trying to manipulate them and how. Regulators must minimize the alignment in economic interests between social media platforms and actors spreading disinformation; carefully designed regulation can benefit both users and companies with democratic values by removing the competitive advantage of acting outside of societal interests. It is particularly important that consumers select social media products that are operated by companies that operate with democratic values in mind: internet platforms operating in the democratic interest can engage in the arms race of ML-based disinformation detection in an attempt to protect their users from disinformation, while platforms outside of societal interests will simply expose their users to ever-more sophisticated ML-generated and targeted propaganda.

The most effective defenses, however, are likely to be nontechnical. Research has indicated that education teaching information literacy and emphasizing the importance of shared democratic values can be effective in mitigating the psychological effects of disinformation. Alarmingly, domestic actors in the U.S. and Europe are undermining the very institutions — academia, independent press, and government — that create the shared truth critical to blunting the impact of disinformation. Most fundamentally, the increasing sophistication of disinformation will require that citizens in liberal democracies become more sophisticated consumers of information.

Margaret E. Kosal is The Academic Incubator’s Georgia Tech liaison and an associate professor at Sam Nunn School of International Affairs, Georgia Institute of Technology. She is editor and contributor to Disruptive and Game-Changing Technologies in Modern Warfare: Development, Use, and Proliferation

The Cipher Brief’s Academic Partnership Program was created to highlight the work and thought leadership of the next generation of national security leaders. If your school is interested in participating, send an email to [email protected].

Read more national security perspectives and insights in The Cipher Brief and sign up for our free daily newsletter.