Analisis Perbandingan Pipelines Pemodelan Topik Berbasis Transformer untuk Literatur Ilmiah

Arief, Muhammad Eka Nur (2026) Analisis Perbandingan Pipelines Pemodelan Topik Berbasis Transformer untuk Literatur Ilmiah. Undergraduate thesis, Universitas Muhammadiyah Malang.

[thumbnail of PENDAHULUAN.pdf]
Preview
Text
PENDAHULUAN.pdf

Download (8MB) | Preview
[thumbnail of BAB I.pdf]
Preview
Text
BAB I.pdf

Download (1MB) | Preview
[thumbnail of BAB II.pdf]
Preview
Text
BAB II.pdf

Download (1MB) | Preview
[thumbnail of BAB III.pdf] Text
BAB III.pdf
Restricted to Registered users only

Download (1MB) | Request a copy
[thumbnail of BAB IV.pdf] Text
BAB IV.pdf
Restricted to Registered users only

Download (1MB) | Request a copy
[thumbnail of BAB V.pdf] Text
BAB V.pdf
Restricted to Registered users only

Download (265kB) | Request a copy
[thumbnail of LAMPIRAN.pdf] Text
LAMPIRAN.pdf
Restricted to Registered users only

Download (59kB) | Request a copy
[thumbnail of POSTER.pdf] Text
POSTER.pdf
Restricted to Registered users only

Download (308kB) | Request a copy

Abstract

The exponential growth of scientific literature poses a significant challenge for manually identifying thematic trends, necessitating automated analysis methods. This study aims to determine an optimal topic modeling pipeline by conducting a comparative analysis between two transformer-based approaches to maximize the coherence of topics extracted from scientific research. Two distinct pipelines were implemented and evaluated on a corpus of 20,972 scientific article abstracts. These included a custom pipeline combining SBERT, UMAP, and HDBSCAN, and the integrated BERTopic model. Performance evaluation, quantitatively benchmarked using the C_v coherence score, revealed that the integrated BERTopic model achieved the highest score of 0.7012. This result significantly surpassed the custom SBERT-UMAP-HDBSCAN pipeline, which scored 0.6079. The findings demonstrate that an integrated, purpose-built model like BERTopic is superior for generating highly coherent and interpretable thematic structures from scientific text. This research provides a validated directive for researchers, demonstrating that integrated models offer a more robust and efficient solution for uncovering thematic structures in large-scale literature analysis compared to custom modular pipelines.

Item Type: Thesis (Undergraduate)
Student ID: 202210370311299
Keywords: BERTopic, Coherence Score, HDBSCAN, Topic Modeling, UMAP;
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > T Technology (General)
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
Divisions: Faculty of Engineering > Department of Informatics (55201)
Depositing User: 202210370311299 kknurarief
Date Deposited: 11 May 2026 04:06
Last Modified: 11 May 2026 04:06
URI: https://eprints.umm.ac.id/id/eprint/29794

Actions (login required)

View Item
View Item