Web-based Profanity Detection Using a Combination of Lexicon and Support Vector Machine
Web-based Profanity Detection Using a Combination of Lexicon and Support Vector Machine
DOI:
https://doi.org/10.26740/jeisbi.v6i4.71408Keywords:
Support Vector Machine, Lexicon-Based, Detection, Profanity, Knowledge Discovery in Database, PrototypeAbstract
Advances in information and communication technology, particularly the internet and social media, have made it easier for people to express their opinions openly, but have also increased the potential for the spread of profanity and hate speech. This study proposes a web-based profanity detection solution by combining lexicon-based methods and Support Vector Machine (SVM). The Knowledge Discovery in Database (KDD) process was implemented for data extraction and analysis, starting from Twitter data collection, preprocessing (cleaning, case folding, tokenizing, stemming), transformation using TF-IDF, to manual labeling. The SVM model was trained using a 3-fold cross-validation scheme, and evaluation was conducted using a classification report and confusion matrix. The results of the study showed a model accuracy of 93% on the test data with an average F1-score of 0.93, as well as optimal performance in detecting sentences categorized as profanity. The developed web application prototype successfully ran all profanity word detection and sensing features automatically, as proven by the black box testing results. The analysis test also ran smoothly, with a test using 10 sentences containing profanity words achieving 100% accuracy, and a test using 10 sentences without profanity words achieving 95% accuracy. This system is expected to contribute to creating a more positive digital space through adaptive and accurate profanity word detection.
Downloads
Downloads
Published
How to Cite
Issue
Section
Abstract views: 0
,
PDF Downloads: 0