Hate speech in social media: How platforms can do better

With all of the resources, power and influence they possess, social media platforms could and should do more to detect hate speech, says a University of Michigan researcher.

In a report from the Anti-Defamation League, Libby Hemphill, an associate research professor at U-M’s Institute for Social Research and an ADL Belfer Fellow, explores social media platforms’ shortcomings when it comes to white supremacist speech and how it differs from general or nonextremist speech, and recommends ways to improve automated hate speech identification methods. “We also sought to determine whether and how white supremacists adapt their speech to avoid detection,” said Hemphill, who is also a professor at U-M’s School of Information. “We found that platforms often miss discussions of conspiracy theories about white genocide and Jewish power and malicious grievances against Jews and people of color. Platforms also let decorous but defamatory speech persist.”

How platforms can do better?

White supremacist speech is readily detectable, Hemphill says, detailing the ways it is distinguishable from commonplace speech in social media, including:

Frequently referencing racial and ethnic groups using plural noun forms (whites, etc.)
Appending “white” to otherwise unmarked terms (e.g., power)
Using less profanity than is common in social media to elude detection based on “offensive” language
Being congruent on both extremist and mainstream platforms
Keeping complaints and messaging consistent from year to year
Describing Jews in racial, rather than religious, terms

“Given the identifiable linguistic markers and consistency across platforms, social media companies should be able to recognize white supremacist speech and distinguish it from general, nontoxic speech,” Hemphill said.

The research team used commonly available computing resources, existing algorithms from machine learning and dynamic topic modeling to conduct the study.

“We needed data from both extremist and mainstream platforms,” said Hemphill, noting that mainstream user data comes from Reddit and extremist website user data comes from Stormfront.