INTEGRASI DATA TABULAR DAN REPRESENTASI TEKS UNTUK PREDIKSI RISIKO KLINIS MENGGUNAKAN MACHINE LEARNING DAN LARGE LANGUAGE MODELS

M. Rafly Rahman, Rahman (2025) INTEGRASI DATA TABULAR DAN REPRESENTASI TEKS UNTUK PREDIKSI RISIKO KLINIS MENGGUNAKAN MACHINE LEARNING DAN LARGE LANGUAGE MODELS. Undergraduate thesis, Universitas Muhammadiyah Malang.

Preview

Text
PENDAHULUAN.pdf
Download (540kB) | Preview

Preview

Text
BAB I.pdf
Download (257kB) | Preview

Preview

Text
BAB II.pdf
Download (378kB) | Preview

Text
BAB III.pdf
Restricted to Registered users only
Download (501kB) | Request a copy

Text
BAB IV.pdf
Restricted to Registered users only
Download (325kB) | Request a copy

Text
BAB V.pdf
Restricted to Registered users only
Download (153kB) | Request a copy

Abstract

Global health is currently facing serious challenges due to the increasing number of chronic disease patients such as heart failure, diabetes, and cancer. This issue arises from the limitations of electronic health record (EHR) systems, which are not yet fully capable of ensuring accurate clinical diagnoses because of potential data input errors and delays in symptom identification by medical personnel. In response to this issue, this paper focuses on the integration of medical tabular data with a classification approach based on classical machine learning (ML) and large language models (LLM) to improve the accuracy of patient diagnosis predictions. This paper aims to develop and compare the performance of various ML models, such as XGBoost, SVM, and Logistic Regression, as well as LLM models like Gemini, LLaMA, and Qwen in fine-tuning, few-shot, and zero-shot scenarios. The paper results show that the combination of Llama and the few-shot approach (250 shots) achieved the highest accuracy of up to 96.0%, in predicting heart failure risk. The main finding of this study is that the narrative text representation of tabular data processed with LLM significantly enhances contextual understanding and classification accuracy, making this approach highly potent for application in AI-based clinical decision-making

Item Type:	Thesis (Undergraduate)
Student ID:	202110370311159
Keywords:	Medical Tabular Data, Large Language Models (LLM), Clinical Risk Prediction, Data Serialization, Few-shot Learning
Subjects:	T Technology > T Technology (General)
Divisions:	Faculty of Engineering > Department of Informatics (55201)
Depositing User:	202110370311159 raflyrahmanr060902
Date Deposited:	07 Feb 2026 04:43
Last Modified:	07 Feb 2026 04:43
URI:	https://eprints.umm.ac.id/id/eprint/26988

Actions (login required)

: View Item