Textual Cyberbullying detection using Ensemble of Machine Learning models
Paper ID: 5633
Gull Bano Anwar
Muhammad Waqas Anwar
Nowadays Cyberbullying on social media has become a major problem. Cyberbullying may cause many serious and negative mental, emotional and physical impacts on a person’s life. However, Cyberbullying leaves a record that can demonstrate value and give proof to help stop digital abuse. The early detection of Cyberbullying on social media becomes crucial to moving the effect on the social media user. In this direction, many studies are conducted to detect Cyberbullying content automatically. The major concern and gap in Cyberbullying detection strategies is the lack of linguistic resources, especially for newly evolved languages. Roman Urdu is a newly emerged and widely used language on social network sites in Asian countries. The greatest strategy to prevent Cyberbullying is to use Machine Learning or Deep Learning with Natural Language Processing (NLP) tools to detect it automatically. This research develops an efficient framework to detect Cyberbullying, using NLP tools with Machine Learning models. The proposed study is validated on a roman-Urdu-abusive-comment-detector (RUACD) dataset using different preprocessing techniques. Five machine learning models Support Vector Machine (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest (RF), and Decision Tree (DT) are evaluated on the RUACD dataset. From experiments, the current study finds that the SVM, LR, and DT outperformed and achieved promising results as SVM, LR and DT achieve 96.19%, 94.91, and 94.01 test accuracy. In last, an ensemble of these outperformed models is formed and achieved 95.92% of test accuracy.