Arief, Muhammad Eka Nur (2026) Analisis Perbandingan Pipelines Pemodelan Topik Berbasis Transformer untuk Literatur Ilmiah. Undergraduate thesis, Universitas Muhammadiyah Malang.
PENDAHULUAN.pdf
Download (8MB) | Preview
BAB I.pdf
Download (1MB) | Preview
BAB II.pdf
Download (1MB) | Preview
BAB III.pdf
Restricted to Registered users only
Download (1MB) | Request a copy
BAB IV.pdf
Restricted to Registered users only
Download (1MB) | Request a copy
BAB V.pdf
Restricted to Registered users only
Download (265kB) | Request a copy
LAMPIRAN.pdf
Restricted to Registered users only
Download (59kB) | Request a copy
POSTER.pdf
Restricted to Registered users only
Download (308kB) | Request a copy
Abstract
The exponential growth of scientific literature poses a significant challenge for manually identifying thematic trends, necessitating automated analysis methods. This study aims to determine an optimal topic modeling pipeline by conducting a comparative analysis between two transformer-based approaches to maximize the coherence of topics extracted from scientific research. Two distinct pipelines were implemented and evaluated on a corpus of 20,972 scientific article abstracts. These included a custom pipeline combining SBERT, UMAP, and HDBSCAN, and the integrated BERTopic model. Performance evaluation, quantitatively benchmarked using the C_v coherence score, revealed that the integrated BERTopic model achieved the highest score of 0.7012. This result significantly surpassed the custom SBERT-UMAP-HDBSCAN pipeline, which scored 0.6079. The findings demonstrate that an integrated, purpose-built model like BERTopic is superior for generating highly coherent and interpretable thematic structures from scientific text. This research provides a validated directive for researchers, demonstrating that integrated models offer a more robust and efficient solution for uncovering thematic structures in large-scale literature analysis compared to custom modular pipelines.
| Item Type: | Thesis (Undergraduate) |
|---|---|
| Student ID: | 202210370311299 |
| Keywords: | BERTopic, Coherence Score, HDBSCAN, Topic Modeling, UMAP; |
| Subjects: | Q Science > Q Science (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science T Technology > T Technology (General) Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources |
| Divisions: | Faculty of Engineering > Department of Informatics (55201) |
| Depositing User: | 202210370311299 kknurarief |
| Date Deposited: | 11 May 2026 04:06 |
| Last Modified: | 11 May 2026 04:06 |
| URI: | https://eprints.umm.ac.id/id/eprint/29794 |
