Ramadhani, Doni (2025) Analisis Perbandingan Efektivitas Stemming Enhanced Confix Stripping dan IN-Idris dalam Klasifikasi Multi Kelas Naïve Bayes Terhadap Berita Online. Undergraduate thesis, Universitas Muhammadiyah Malang.
PENDAHULUAN.pdf
Download (921kB) | Preview
BAB I.pdf
Download (309kB) | Preview
BAB II.pdf
Download (501kB) | Preview
BAB III.pdf
Restricted to Registered users only
Download (357kB) | Request a copy
BAB IV.pdf
Restricted to Registered users only
Download (538kB) | Request a copy
BAB V.pdf
Restricted to Registered users only
Download (229kB) | Request a copy
LAMPIRAN.pdf
Restricted to Registered users only
Download (442kB) | Request a copy
POSTER.pdf
Restricted to Registered users only
Download (190kB) | Request a copy
Abstract
Stemming is a crucial stage in Indonesian text preprocessing, aiming to return inflected words to their basic word forms, which significantly impacts the quality of feature representation for classification. This study aims to analyze and compare the effectiveness of two rule-based stemming algorithms, namely Enhanced Confix Stripping (ECS) and IN-Idris, in improving the performance of multi-class classification on news articles. The methodology involved utilizing a dataset of 2,896 news articles from KompasTV across various categories. The steps included standard preprocessing (case folding, cleansing, stopword removal, and tokenizing), followed by the implementation of ECS and IN-Idris, feature extraction using the Bag of Words (BoW) model, and classification with the Naïve Bayes algorithm. Testing results indicate that ECS successfully generated slightly more correct basic words (14,588) compared to IN-Idris (14,579). However, from the perspective of model performance, IN-Idris demonstrated superiority with a classification accuracy of 65.11%, surpassing the ECS accuracy of 63.73%. The discussion highlights that ECS is more effective in addressing the overstemming problem through its suffix restoration mechanism, while IN-Idris excels in handling double affixes via its algorithm repetition. The conclusion of this study is that the quality of a stemming algorithm is not solely determined by the quantity of correct basic words, but primarily by its ability to minimize error types (overstemming and understemming) that directly impact the classification model's accuracy. Thus, the IN-Idris mechanism proved more effective in the context of this multi-class text classification.
| Item Type: | Thesis (Undergraduate) |
|---|---|
| Student ID: | 202010370311315 |
| Keywords: | Stemming, Enhanced Confix Stripping, IN-Idris, Multi-Class Classification, Naive Bayes Accuracy |
| Subjects: | Q Science > Q Science (General) T Technology > T Technology (General) |
| Divisions: | Faculty of Engineering > Department of Informatics (55201) |
| Depositing User: | 202010370311315 doniexca01 |
| Date Deposited: | 04 Feb 2026 06:01 |
| Last Modified: | 04 Feb 2026 06:01 |
| URI: | https://eprints.umm.ac.id/id/eprint/27113 |
