PENERAPAN MACHINE LEARNING FAIRNESS PADA DATA TABULAR : STUDI KASUS TERHADAP DATA MEDIS

MU'ALIM, MUHAMAD MINAN NUR (2025) PENERAPAN MACHINE LEARNING FAIRNESS PADA DATA TABULAR : STUDI KASUS TERHADAP DATA MEDIS. Undergraduate thesis, Universitas Muhammadiyah Malang.

There is a more recent version of this item available.

Preview

Text
PENDAHULUAN.pdf
Download (1MB) | Preview

Preview

Text
BAB I.pdf
Download (240kB) | Preview

Preview

Text
BAB II.pdf
Download (442kB) | Preview

Text
BAB III.pdf
Restricted to Registered users only
Download (467kB) | Request a copy

Text
BAB IV.pdf
Restricted to Registered users only
Download (639kB) | Request a copy

Text
BAB V.pdf
Restricted to Registered users only
Download (180kB) | Request a copy

Text
LAMPIRAN.pdf
Restricted to Registered users only
Download (581kB) | Request a copy

Text
POSTER.pdf
Restricted to Registered users only
Download (1MB) | Request a copy

Abstract

Diabetes is a chronic disease whose prevalence continues to rise, with projections indicating that the number of cases could reach millions in the future. This research analyzes the relationship between lifestyle and diabetes using the "CDC Diabetes Health Indicators" dataset through data preprocessing and transformation to optimize machine learning models. Machine Learning algorithms such as Decision Tree, Random Forest, Logistic Regression, and Naive Bayes are used to classify individuals into the categories of "no diabetes", "pre-diabetes", or "diabetes". The model's performance is evaluated using classification metrics such as accuracy, precision, recall, F1-score, AUC, and confusion matrix. Bias detection was conducted using the DALEX library with a focus on protected attributes such as age and gender. The results reveal that binary probability-based algorithm models like Gaussian Naive Bayes are considered the most fair because they treat all groups almost equally, so no particular group is overly advantaged or disadvantaged. However, this model is sometimes not as accurate as other models in predicting whether someone has diabetes or not. On the other hand, tree-based algorithm models like Random Forest are indeed slightly less fair compared to Naive Bayes, but their ability to predict correctly and incorrectly is higher. This means that Random Forest can provide more accurate results, although its fairness among groups is not as good as Naive Bayes. Based on these results, this emphasizes the importance of applying ethical and fair machine learning in disease diagnosis, including efforts to enhance equity and transparency in predictive models.

Item Type:	Thesis (Undergraduate)
Student ID:	202110370311197
Keywords:	Diabetes, Machine Learning, Machine Learning Fairness, Bias Detection, DALEX.
Subjects:	Q Science > Q Science (General)
Divisions:	Faculty of Engineering > Department of Informatics (55201)
Depositing User:	202110370311197 muhamadminannurmualim
Date Deposited:	04 Aug 2025 11:09
Last Modified:	04 Aug 2025 11:09
URI:	https://eprints.umm.ac.id/id/eprint/21244

Available Versions of this Item

PENERAPAN MACHINE LEARNING FAIRNESS PADA DATA TABULAR : STUDI KASUS TERHADAP DATA MEDIS. (deposited 04 Aug 2025 11:09) [Currently Displayed]
- PENERAPAN MACHINE LEARNING FAIRNESS PADA DATA TABULAR : STUDI KASUS TERHADAP DATA MEDIS. (deposited 13 Aug 2025 05:46)

Actions (login required)

: View Item