Tweet Categorization and Sentiment Analysis of Tweets

Dublin Core

Title

Tweet Categorization and Sentiment Analysis of Tweets

Author

Vedad Fejzagić

Abstract

In today’s era, using internet platforms to convey information to others, whether family, friends, or strangers has become the norm. One of the leading social platforms in that regard is “Twitter” (now “X”). The effectiveness of communication on such platforms can be analyzed through the process of sentiment analysis. Sentiment analysis is considered a classification problem that determines whether an input is positive or negative.

The research aimed to show to what extent certain machine learning models outperform others for the given subset of data, depending on the choice of preprocessing steps within the sentiment analysis domain. This can be divided into two goals. The first goal was to present results on how one pipeline of preprocessing steps affects each machine learning model compared to the other preprocessing pipeline. The second goal was to present results on the viability of using several machine learning models for sentiment analysis of tweets by comparing the accuracies of each. For that purpose, a single subset was taken from the data and duplicated two times. Each subset duplicate had different preprocessing steps applied to it. Afterward, both subsets were fed to several machine-learning models in order to gauge their performance. 

Finally, this paper presented results on the aforementioned processes for which it was found that the Naïve Bayes machine learning model had the best accuracy, while the choice of preprocessing steps proved to be almost negligible in improving the overall model accuracy.

Keywords

Sentiment Analysis, Machine Learning, Preprocessing, Natural Language
Processing, Twitter

Document Viewer