Dublin Core
Title
Sentiment Analysis of Youtube Comments and Comparison of Different Machine Learning Models
Abstract
Every day, a large number of media is shared on social media websites, as well as others sharing their opinions and sentiments on said posts. In this project, the main goal was to examine to what degree traditional machine learning algorithms can be used to classify and examine the sentiment of YouTube comments—are they positive, negative, or neutral? Because comments are generally brief, full of slang terms, emojis, and even misspellings on some occasions, this type of assignment can be difficult.
To address this, a dataset of YouTube comments labeled for sentiment was acquired. The data was preprocessed by removing stopwords and punctuation, transforming the text into numbers using the TF-IDF method, and various models such as Logistic Regression, Naive Bayes, Decision Tree, and Support Vector Machine (SVM) were trained. The performance of each model was compared, on metrics such as accuracy, precision, recall, and F1-score.
Overall, the project helped demonstrate that with proper precautions in place, even some of the older machine learning algorithms can be effective at determining sentiment in random real-world text like YouTube comments. It also allowed for a better understanding of how important preprocessing really is when working with text data.
To address this, a dataset of YouTube comments labeled for sentiment was acquired. The data was preprocessed by removing stopwords and punctuation, transforming the text into numbers using the TF-IDF method, and various models such as Logistic Regression, Naive Bayes, Decision Tree, and Support Vector Machine (SVM) were trained. The performance of each model was compared, on metrics such as accuracy, precision, recall, and F1-score.
Overall, the project helped demonstrate that with proper precautions in place, even some of the older machine learning algorithms can be effective at determining sentiment in random real-world text like YouTube comments. It also allowed for a better understanding of how important preprocessing really is when working with text data.
Keywords
Sentiment analysis, YouTube comments, natural language processing, machine learning, logistic regression, support vector machine, random forest, SMOTE, BERT, deep learning, text classification, feature extraction, TF-IDF, data preprocessing, model comparison, class imbalance, evaluation metrics, confusion matrix
Language
English language