How to build a hate speech detection in class

Online hate speech is omnipresent and it’s very likely that people you know have already encountered offensive comments on social media. This is an opportunity for an interdisciplinary project. In a 2020 study, data from 320 students who are between 14 and 19 years old were collected. It found out that 57% had been exposed to online hate speech within two months prior to the survey. This delicate topic can be covered in various ways. On the one hand, it’s a great opportunity to learn more about AI in math and CS classes, on the other hand, and probably even more important, the problem has to be discussed profoundly from a societal perspective. Here are some ideas:

What are the experiences of your students? Have they been exposed to online hate speech?
In which ways can victims be helped? What should happen to writers of offensive comments?
Is the internet a bad place? Should the platforms be held accountable?

The Tech behind it

Building an AI means that you have to be more specific about certain aspects. The following questions can be discussed by students, no matter if they are more into math or language:

Which words are offensive and which ones are allowed?
How do we differentiate between hate and satire?
In order to process comments and train an AI model, we have to represent comments with numbers. How could we do this?

In the following, I will train a very simple hate speech detection model using Machine Learning and Natural Language Processing. There are plenty of datasets in different languages available online. This site seems to be a great collection. I opted for a german dataset containing 5009 tweets that are labelled as either OFFENSIVE or OTHER (which means non-offensive here):

The goal is to train an AI model that can read a new tweet and decide whether it’s offensive or not. We want to use the above dataset to train the AI. The following steps are necessary:

Process the tweets by removing unnecessary information and by representing the tweets in a numerical way
Pick a Machine Learning algorithm that will train an AI model
Train and test an AI model

Process the tweets

Each tweet is a string of characters. First, we loop through all the tweets in the dataset and remove hyperlinks, Twitter marks and styles. We then tokenize the strings, i.e. we convert the strings into lists of words, punctuation marks, emojis, etc. Next, we remove stop words such as “not”, “and”, “an”. These are words that don’t add any meaning to the tweet. As an example, the tweet

“Deutsche Medien, Halbwahrheiten und einseitige Betrachung, wie bei allen vom Staat finanzierten ‘billigen’ Propagandeinstitutionen 😜”

now looks like this:

[‘deutsche’, ‘medien’, ‘halbwahrheiten’, ‘einseitige’, ‘betrachtung’, ‘staat’, ‘finanzierten’, ‘billigen’, ‘propagandainstitutionen’, ‘😜’]

Finally, we want to convert the processed tweet into a numerical representation that we can feed into mathematical functions. There are sophisticated approaches that I may cover in another article but for this project, we use a very basic way to do so:

We look at every word in the entire dataset across all tweets and count how many times each word appears in an offensive tweet and how many times it appears in a non-offensive tweet. Consequently, every word in the dataset would have two counts, the nr-of-times-it-appears-in-offensive-tweets-count (let’s call it count1) and the nr-of-times-it-appears-in-NON-offensive-tweets-count (count2).

Now we look at a processed tweet and sum up count1 and count2 of all its tokens. This results in two numbers that represent a single tweet.

Pick a Machine Learning algorithm

An algorithm that is simple but powerful is Logistic Regression. You can think of it as an S-shaped curve that lies between 0 and 1. Tweets that lead to an output greater than 0.5 could then be interpreted as OFFENSIVE. Here, training the AI means that we want to stretch and move the curve horizontally until we obtain the most accurate results.

As mentioned above we want to stretch and move the curve, i.e. we want to choose these two, let’s call them parameters, optimally. First of all, we split the dataset into a training set and a test set. We choose 80% of the tweets as our training set and use the remaining tweets to test our final AI model. To train the model, we define a so-called cost function that gives us an error metric for every potential pair of parameters. By using gradient descent we can find the optimal pair of parameters that minimize the cost function and which are therefore the optimal stretching and moving parameters for the above curve.

We can now use the test set to see how our AI model performs. The following are two important performance metrics:

Precision score: Of the tweets the AI model predicted as OFFENSIVE, how many really were OFFENSIVE? We obtained almost 70%.
Recall score: Of the tweets that are actually OFFENSIVE, how many did the AI model detect? We obtained about 40%

The results are not bad for such a basic model but there is of course much room for improvement. What we could do now is to choose different numerical representations for the tweets, try different algorithms to train the model, find a larger dataset, etc. The real work begins now but the approach we used is a good way to let your students play with real data and learn more about AI.

Do you have a Data Science or Machine Learning Challenges?

We are helping you to solve your challenge. Just reach out to us.

Kategorie

Kontakt

How to build a hate speech detection in class

kontakt