Web-based Profanity Detection Using a Combination of Lexicon and Support Vector Machine: Web-based Profanity Detection Using a Combination of Lexicon and Support Vector Machine

Ainandita Riwipapusa; Wiyli Yustanti

doi:10.26740/jeisbi.v6i4.71408

Authors

Ainandita Riwipapusa Universitas Negeri Surabaya
Wiyli Yustanti Universitas Negeri Surabaya

DOI:

https://doi.org/10.26740/jeisbi.v6i4.71408

Keywords:

Support Vector Machine, Lexicon-Based, Detection, Profanity, Knowledge Discovery in Database, Prototype

Abstract

Advances in information and communication technology, particularly the internet and social media, have made it easier for people to express their opinions openly, but have also increased the potential for the spread of profanity and hate speech. This study proposes a web-based profanity detection solution by combining lexicon-based methods and Support Vector Machine (SVM). The Knowledge Discovery in Database (KDD) process was implemented for data extraction and analysis, starting from Twitter data collection, preprocessing (cleaning, case folding, tokenizing, stemming), transformation using TF-IDF, to manual labeling. The SVM model was trained using a 3-fold cross-validation scheme, and evaluation was conducted using a classification report and confusion matrix. The results of the study showed a model accuracy of 93% on the test data with an average F1-score of 0.93, as well as optimal performance in detecting sentences categorized as profanity. The developed web application prototype successfully ran all profanity word detection and sensing features automatically, as proven by the black box testing results. The analysis test also ran smoothly, with a test using 10 sentences containing profanity words achieving 100% accuracy, and a test using 10 sentences without profanity words achieving 95% accuracy. This system is expected to contribute to creating a more positive digital space through adaptive and accurate profanity word detection.

Downloads

Download data is not yet available.

Web-based Profanity Detection Using a Combination of Lexicon and Support Vector Machine