Jinyan Zang is a researcher at Data Privacy Lab and a Ph.D. candidate in Government at Harvard University .
Latanya Sweeney is a professor of government and technology in residence at Harvard University’s Department of Government, editor-in-chief of Technology Science and the founding director of the Technology Science Initiative and the Data Privacy Lab at the Institute for Quantitative Social Science at Harvard.
Max Weiss is a senior at Harvard University and the student who implemented the Deepfake Text experiment.
As federal agencies take increasingly stringent actions to try to limit the spread of the novel coronavirus pandemic within the U.S., how can individual Americans and U.S. companies affected by these rules weigh in with their opinions and experiences? Because many of the new rules, such as travel restrictions and increased surveillance, require expansions of federal power beyond normal circumstances, our laws require the federal government to post these rules publicly and allow the public to contribute their comments to the proposed rules online. But are federal public comment websites — a vital institution for American democracy — secure in this time of crisis? Or are they vulnerable to bot attack?
In December 2019, we published a new study to see firsthand just how vulnerable the public comment process is to an automated attack. Using publicly available artificial intelligence (AI) methods, we successfully generated 1,001 comments of deepfake text, computer-generated text that closely mimics human speech, and submitted them to the Centers for Medicare & Medicaid Services’ (CMS) website for a proposed federal rule that would institute mandatory work reporting requirements for citizens on Medicaid in Idaho.
The comments we produced using deepfake text constituted over 55% of the 1,810 total comments submitted during the federal public comment period. In a follow-up study, we asked people to identify whether comments were from a bot or a human. Respondents were only correct half of the time — the same probability as random guessing.
The example above is deepfake text generated by the bot that all survey respondents thought was from a human.
We ultimately informed CMS of our deepfake comments and withdrew them from the public record. But a malicious attacker would likely not do the same.
Previous large-scale fake comment attacks on federal websites have occurred, such as the 2017 attack on the FCC website regarding the proposed rule to end net neutrality regulations.
During the net neutrality comment period, firms hired by industry group Broadband for America used bots to create comments expressing support for the repeal of net neutrality. They then submitted millions of comments, sometimes even using the stolen identities of deceased voters and the names of fictional characters, to distort the appearance of public opinion.
A retroactive text analysis of the comments found that 96-97% of the more than 22 million comments on the FCC’s proposal to repeal net neutrality were likely coordinated bot campaigns. These campaigns used relatively unsophisticated and conspicuous search-and-replace methods — easily detectable even on this mass scale. But even after investigations revealed the comments were fraudulent and made using simple search-and-replace-like computer techniques, the FCC still accepted them as part of the public comment process.
Even these relatively unsophisticated campaigns were able to affect a federal policy outcome. However, our demonstration of the threat from bots submitting deepfake text shows that future attacks can be far more sophisticated and much harder to detect.
The laws and politics of public comments
Let’s be clear: The ability to communicate our needs and have them considered is the cornerstone of the democratic model. As enshrined in the Constitution and defended fiercely by civil liberties organizations, each American is guaranteed a role in participating in government through voting, through self-expression and through dissent.
When it comes to new rules from federal agencies that can have sweeping impacts across America, public comment periods are the legally required method to allow members of the public, advocacy groups and corporations that would be most affected by proposed rules to express their concerns to the agency and require the agency to consider these comments before they decide on the final version of the rule. This requirement for public comments has been in place since the passage of the Administrative Procedure Act of 1946. In 2002, the e-Government Act required the federal government to create an online tool to receive public comments. Over the years, there have been multiple court rulings requiring the federal agency to demonstrate that they actually examined the submitted comments and publish any analysis of relevant materials and justification of decisions made in light of public comments [see Citizens to Preserve Overton Park, Inc. v. Volpe, 401 U. S. 402, 416 (1971); Home Box Office, supra, 567 F.2d at 36 (1977), Thompson v. Clark, 741 F. 2d 401, 408 (CADC 1984)].
In fact, we only had a public comment website from CMS to test for vulnerability to deepfake text submissions in our study, because in June 2019, the U.S. Supreme Court ruled in a 7-1 decision that CMS could not skip the public comment requirements of the Administrative Procedure Act in reviewing proposals from state governments to add work reporting requirements to Medicaid eligibility rules within their state.
The impact of public comments on the final rule by a federal agency can be substantial based on political science research. For example, in 2018, Harvard University researchers found that banks that commented on Dodd-Frank-related rules by the Federal Reserve obtained $7 billion in excess returns compared to non-participants. When they examined the submitted comments to the “Volcker Rule” and the debit card interchange rule, they found significant influence from submitted comments by different banks during the “sausage-making process” from the initial proposed rule to the final rule.
Beyond commenting directly using their official corporate names, we’ve also seen how an industry group, Broadband for America, in 2017 would submit millions of fake comments in support of the FCC’s rule to end net neutrality in order to create the false perception of broad political support for the FCC’s rule amongst the American public.
Technology solutions to deepfake text on public comments
While our study highlights the threat of deepfake text to disrupt public comment websites, this doesn’t mean we should end this long-standing institution of American democracy, but rather we need to identify how technology can be used for innovative solutions that accepts public comments from real humans while rejecting deepfake text from bots.
There are two stages in the public comment process — (1) comment submission and (2) comment acceptance — where technology can be used as potential solutions.
In the first stage of comment submission, technology can be used to prevent bots from submitting deepfake comments in the first place; thus raising the cost for an attacker to need to recruit large numbers of humans instead. One technological solution that many are already familiar with are the CAPTCHA boxes that we see at the bottom of internet forms that ask us to identify a word — either visually or audibly — before being able to click submit. CAPTCHAs provide an extra step that makes the submission process increasingly difficult for a bot. While these tools can be improved for accessibility for disabled individuals, they would be a step in the right direction.
However, CAPTCHAs would not prevent an attacker willing to pay for low-cost labor abroad to solve any CAPTCHA tests in order to submit deepfake comments. One way to get around that may be to require strict identification to be provided along with every submission, but that would remove the possibility for anonymous comments that are currently accepted by agencies such as CMS and the Food and Drug Administration (FDA). Anonymous comments serve as a method of privacy protection for individuals who may be significantly affected by a proposed rule on a sensitive topic such as healthcare without needing to disclose their identity. Thus, the technological challenge would be to build a system that can separate the user authentication step from the comment submission step so only authenticated individuals can submit a comment anonymously.
Finally, in the second stage of comment acceptance, better technology can be used to distinguish between deepfake text and human submissions. While our study found that our sample of over 100 people surveyed were not able to identify the deepfake text examples, more sophisticated spam detection algorithms in the future may be more successful. As machine learning methods advance over time, we may see an arms race between deepfake text generation and deepfake text identification algorithms.
The challenge today
While future technologies may offer more comprehensive solutions, the threat of deepfake text to our American democracy is real and present today. Thus, we recommend that all federal public comment websites adopt state-of-the-art CAPTCHAs as an interim measure of security, a position that is also supported by the 2019 U.S. Senate Subcommittee on Investigations’ Report on Abuses of the Federal Notice-and-Comment Rulemaking Process.
In order to develop more robust future technological solutions, we will need to build a collaborative effort between the government, researchers and our innovators in the private sector. That’s why we at Harvard University have joined the Public Interest Technology University Network along with 20 other education institutions, New America, the Ford Foundation and the Hewlett Foundation. Collectively, we are dedicated to helping inspire a new generation of civic-minded technologists and policy leaders. Through curriculum, research and experiential learning programs, we hope to build the field of public interest technology and a future where technology is made and regulated with the public in mind from the beginning.
While COVID-19 has disrupted many parts of American society, it hasn’t stopped federal agencies under the Trump administration from continuing to propose new deregulatory rules that can have long-lasting legacies that will be felt long after the current pandemic has ended. For example, on March 18, 2020, the Environmental Protection Agency (EPA) proposed new rules about limiting which research studies can be used to support EPA regulations, which have received over 610,000 comments as of April 6, 2020. On April 2, 2020, the Department of Education proposed new rules for permanently relaxing regulations for online education and distance learning. On February 19, 2020, the FCC re-opened public comments on its net neutrality rules, which in 2017 saw 22 million comments submitted by bots, after a federal court ruled that the FCC ignored how ending net neutrality would affect public safety and cellphone access programs for low-income Americans.
Federal public comment websites offer the only way for the American public and organizations to express their concerns to the federal agency before the final rules are determined. We must adopt better technological defenses to ensure that deepfake text doesn’t further threaten American democracy during a time of crisis.