Analisis Perbandingan Efektivitas Stemming Enhanced Confix Stripping dan IN-Idris dalam Klasifikasi Multi Kelas Naïve Bayes Terhadap Berita Online

Ramadhani, Doni (2025) Analisis Perbandingan Efektivitas Stemming Enhanced Confix Stripping dan IN-Idris dalam Klasifikasi Multi Kelas Naïve Bayes Terhadap Berita Online. Undergraduate thesis, Universitas Muhammadiyah Malang.

[thumbnail of PENDAHULUAN.pdf]
Preview
Text
PENDAHULUAN.pdf

Download (921kB) | Preview
[thumbnail of BAB I.pdf]
Preview
Text
BAB I.pdf

Download (309kB) | Preview
[thumbnail of BAB II.pdf]
Preview
Text
BAB II.pdf

Download (501kB) | Preview
[thumbnail of BAB III.pdf] Text
BAB III.pdf
Restricted to Registered users only

Download (357kB) | Request a copy
[thumbnail of BAB IV.pdf] Text
BAB IV.pdf
Restricted to Registered users only

Download (538kB) | Request a copy
[thumbnail of BAB V.pdf] Text
BAB V.pdf
Restricted to Registered users only

Download (229kB) | Request a copy
[thumbnail of LAMPIRAN.pdf] Text
LAMPIRAN.pdf
Restricted to Registered users only

Download (442kB) | Request a copy
[thumbnail of POSTER.pdf] Text
POSTER.pdf
Restricted to Registered users only

Download (190kB) | Request a copy

Abstract

Stemming is a crucial stage in Indonesian text preprocessing, aiming to return inflected words to their basic word forms, which significantly impacts the quality of feature representation for classification. This study aims to analyze and compare the effectiveness of two rule-based stemming algorithms, namely Enhanced Confix Stripping (ECS) and IN-Idris, in improving the performance of multi-class classification on news articles. The methodology involved utilizing a dataset of 2,896 news articles from KompasTV across various categories. The steps included standard preprocessing (case folding, cleansing, stopword removal, and tokenizing), followed by the implementation of ECS and IN-Idris, feature extraction using the Bag of Words (BoW) model, and classification with the Naïve Bayes algorithm. Testing results indicate that ECS successfully generated slightly more correct basic words (14,588) compared to IN-Idris (14,579). However, from the perspective of model performance, IN-Idris demonstrated superiority with a classification accuracy of 65.11%, surpassing the ECS accuracy of 63.73%. The discussion highlights that ECS is more effective in addressing the overstemming problem through its suffix restoration mechanism, while IN-Idris excels in handling double affixes via its algorithm repetition. The conclusion of this study is that the quality of a stemming algorithm is not solely determined by the quantity of correct basic words, but primarily by its ability to minimize error types (overstemming and understemming) that directly impact the classification model's accuracy. Thus, the IN-Idris mechanism proved more effective in the context of this multi-class text classification.

Item Type: Thesis (Undergraduate)
Student ID: 202010370311315
Keywords: Stemming, Enhanced Confix Stripping, IN-Idris, Multi-Class Classification, Naive Bayes Accuracy
Subjects: Q Science > Q Science (General)
T Technology > T Technology (General)
Divisions: Faculty of Engineering > Department of Informatics (55201)
Depositing User: 202010370311315 doniexca01
Date Deposited: 04 Feb 2026 06:01
Last Modified: 04 Feb 2026 06:01
URI: https://eprints.umm.ac.id/id/eprint/27113

Actions (login required)

View Item
View Item