As social media platforms have become ever more intrinsic to how we live our lives and begun to evolve into the primary medium through which we communicate and listen to the rest of the world, their rise has handed a megaphone to the world’s hate and vitriol. In fact, it was Twitter who initially stepped forward to staunchly defend the rights of terrorists and their sympathizers to communicate via its platform before abruptly reversing itself in the face of fierce public criticism. Yet, despite myriad programs and policies designed on paper to fight abuse, in reality the platforms have done very little to curb the spread of hate speech, harassment and violent threats. This raises the question of whether the rise of deep learning-powered “bots” could offer a powerful solution to online hate speech, by deploying them en masse to report, counter and overwhelm hateful posts in realtime.
Over the last few years deep learning algorithms have made enormous advances in their ability to process human text and imagery at levels of sophistication and accuracy that approach human levels at times, while even simple ELIZA bots have managed to carry on fairly convincing chats for more than half a century.
While far from HAL 9000 levels of comprehension, the current state of the art in deep learning and heuristic-powered chat bots are quite capable of the kind of “linguistic legerdemain” (to use the words of Spock) required to identify simplistic overt threats of violence and hate speech that are readily found on social platforms like Twitter. Nuanced attacks on women’s rights would likely escape such algorithms, but simple calls promoting the beating of women or violence against public figures or using racial or charged epithets to denigrate ethic, religious or other minorities could be readily identified at quite high levels of accuracy.
This raises the question – what’s a bot to do when it finds hate speech online? One could imagine at the simplest level a small set of bots that simply scour social media platforms for any matching posts and automatically report them via the platforms’ native abuse reporting tools. Often companies like Twitter or Facebook respond to high profile cases of abusive behavior on their platforms by offering generic statements that any reported abuse is removed, but nearly always stop short of confirming or denying whether anyone actually reported the posts in question as abuse. Auto-reporting bots like these would not only ensure that every overt hate speech post is reported to the platforms, but it would uniquely offer an electronic evidence trail recording the precise timestamp when the post was reported so response times for different kinds of content can be measured. Does Twitter take down threats against some minorities more quickly than others? Does Facebook have a higher takedown rate for harassment of people in the news compared with those not being discussed at the moment?
By flooding the platforms with reports of every single hateful post and offering an evidence trail proving they were aware of the posts, such bots would at the very least force social media companies to acknowledge the scale and scope of hate speech on their platforms. By conducting follow up tracking of all reported posts to see which ones the companies take down and which they leave up, the companies’ internal abuse guidelines can also be precisely quantified.
At a minimum, this would force Twitter, Facebook and others to publicly codify their acceptable content guidelines in response to the community audits such data would enable. If Twitter publicly prohibits threats of violence, yet the data shows a single meme with thousands of reported overt threats of violence against women that the company declined to remove, it would face immense pressure to either step up its enforcement activity or clarify its policies on threats against women.
It would also force the companies to confront and codify cultural and religious exceptions to their policies. In some countries members of certain racial, ethnic, religious or other minorities have very limited rights and laws in those countries may permit what Americans might consider to be overt and clear hate speech or threats of violence. In the absence of robust quantitative data on how the companies handle such exceptions it is unclear how they reconcile differences in the concepts of acceptable speech globally. Offering a running post-by-post log of the companies’ responses to reported abuse would make this much clearer.
One could even imagine such bots publishing a daily list of all flagged abusive posts along with a daily leaderboard of the most egregious offenders. Such a list would likely violate the platforms’ provisions on data use and privacy, but if the companies began shutting down accounts posting these lists or turning to the courts to stop such lists, the backlash would likely be intense.
Such bots would be strictly passive in nature, recording suspected abuse and reporting it through the platforms’ existing abuse channels, but doing nothing further to counter it. Imagine if things went a step further and the bots were given authority to actually counter hate speech themselves – what might this look like?
Today it is relatively straightforward to buy access to armies of millions of fake social media accounts that can be used for everything from buying followers or likes to artificially elevating a topic by posting millions of positive or negative comments about someone or something. Instead of a small handful of bots scanning Twitter and flagging abusive posts, what if an army of millions of bots were given control over millions of Twitter accounts and given unlimited authority to counter all hate speech they encountered?
Under one model, the bots would coordinate with each other and when any of them find an account posting a hateful comment, that account would be sent a single response warning the owner that their comment could be deeply offensive and asking that in future they refrain from such posts. The user account would then be added to a list and future posts by the same account would result in referrals to the platform’s abuse line, but no further automated responses from the bots, or perhaps at most one response per day.
Yet, if someone tweets encouragement to beat women, it is unlikely that a single chastising tweet from an anonymous account is going to change their behavior. Countering such hate speech requires more than chastisement, it requires responses which encourage self-censorship. The Chinese government learned long ago that merely deleting posts does little to change behavior, but that if you instead flood the person with posts attacking them and publicly shame them they will likely think twice about future posts. You won’t change their viewpoints, but you will cause them to self-censor and no longer overtly share those views with others.
Imagine the Chinese censorship model replicated using an army of AI-powered bots. Someone posts a tweet that encourages unprovoked violent attacks against a minority group for their religious beliefs. A few thousand of the automated bots flag the post to Twitter and then begin a relentless campaign of counter posts criticizing the poster, flooding his or her Twitter account in the process and likely causing many of the account’s followers to unfollow it to avoid the flood of incoming posts. The bots might also scan the account’s entire posting history and identify other accounts it frequently corresponds with, retweeting its responses to those accounts. At sufficiently high volume, this bot army would bury hate speech posts in a flood of anti-hate-speech discussion and toxify hate-speech-posting accounts to the point that they may lose many of their followers who are fleeing the barrage of posts.
Twitter would almost certainly begin blocking these bot accounts as quickly as it could, but the ease of registering new accounts means it would be relatively trivial to stay ahead of those bans. The ensuing media coverage and public dialog would place the platforms under intense pressure to finally devote real resources to combating hate speech. Moreover, the tens or even hundreds of millions of daily posts from these bots would render the platforms almost unusable, leaving them no choice but to adopt technological measures both to combat hate speech and to address the issue of robotic accounts. Yet, even if robotic accounts were finally successfully eliminated, the success of such a bot army would likely lead to a human volunteer army to replace it.
In fact, perhaps the most surprising story here is that no-one has actually done this at scale. From counterterrorism to counterfeiting, human trafficking to hate speech, illegal activity to threats of violence, any issue imaginable could be combated through such bot warfare.
Of course, the opposite is also true – once bot warfare is used to fight hate speech online, it is entirely likely that those who promote hate speech would return with their own bot armies to promote it. In many ways the Islamic State has proven the success of this model, using a human-based equivalent that leverages its army of global supporters to post content from a myriad accounts and moving from account to account as they are shut down, illustrating how hard it is to fight such networks. But, elevating the world of online censorship from humans to bots would profoundly reshape the landscape of free speech in that the shear scalability of bots means a bot army could instantly overwhelm organic human discourse, much as automated trading has begun to overwhelm human influence in the financial markets.
One could imagine that governments like China and Russia are already investing heavily in experimenting with such “bot idea armies” and deploying prototypes to augment their vast human propaganda armies.
Indeed, it is an interesting commentary on Silicon Valley’s focus that Facebook’s founder Mark Zuckerberg has focused more public attention on his devotion of 100 hours of his time to making a robotic butler that can make him toast than on spending those 100 hours personally building a chatbot to fight hate speech online.
When I reached out to Twitter and Facebook for comment, Facebook did not respond by publication time while Twitter responded with a link to its “automation rules and best practices” guide, but said it had no comment on the specific applications outlined here and whether they would be deemed in violation of those practices or might be permitted in limited fashion to help the platform combat hate speech.
In the end, as the simplistic heuristic-based chatbots of the past half century give way to sophisticated deep learning powered algorithms capable of intricately emulating human conversation, it is only a matter of time until we see those chatbots weaponized and deployed as “free speech” and “counter speech” armies that will forever reshape the online world.