PENERAPAN MACHINE LEARNING FAIRNESS PADA DATA TABULAR : STUDI KASUS TERHADAP DATA MEDIS

MU'ALIM, MUHAMAD MINAN NUR (2025) PENERAPAN MACHINE LEARNING FAIRNESS PADA DATA TABULAR : STUDI KASUS TERHADAP DATA MEDIS. Undergraduate thesis, Universitas Muhammadiyah Malang.

Warning
There is a more recent version of this item available.
[thumbnail of PENDAHULUAN.pdf]
Preview
Text
PENDAHULUAN.pdf

Download (1MB) | Preview
[thumbnail of BAB I.pdf]
Preview
Text
BAB I.pdf

Download (240kB) | Preview
[thumbnail of BAB II.pdf]
Preview
Text
BAB II.pdf

Download (442kB) | Preview
[thumbnail of BAB III.pdf] Text
BAB III.pdf
Restricted to Registered users only

Download (467kB) | Request a copy
[thumbnail of BAB IV.pdf] Text
BAB IV.pdf
Restricted to Registered users only

Download (639kB) | Request a copy
[thumbnail of BAB V.pdf] Text
BAB V.pdf
Restricted to Registered users only

Download (180kB) | Request a copy
[thumbnail of LAMPIRAN.pdf] Text
LAMPIRAN.pdf
Restricted to Registered users only

Download (581kB) | Request a copy
[thumbnail of POSTER.pdf] Text
POSTER.pdf
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

Diabetes is a chronic disease whose prevalence continues to rise, with projections indicating that the number of cases could reach millions in the future. This research analyzes the relationship between lifestyle and diabetes using the "CDC Diabetes Health Indicators" dataset through data preprocessing and transformation to optimize machine learning models. Machine Learning algorithms such as Decision Tree, Random Forest, Logistic Regression, and Naive Bayes are used to classify individuals into the categories of "no diabetes", "pre-diabetes", or "diabetes". The model's performance is evaluated using classification metrics such as accuracy, precision, recall, F1-score, AUC, and confusion matrix. Bias detection was conducted using the DALEX library with a focus on protected attributes such as age and gender. The results reveal that binary probability-based algorithm models like Gaussian Naive Bayes are considered the most fair because they treat all groups almost equally, so no particular group is overly advantaged or disadvantaged. However, this model is sometimes not as accurate as other models in predicting whether someone has diabetes or not. On the other hand, tree-based algorithm models like Random Forest are indeed slightly less fair compared to Naive Bayes, but their ability to predict correctly and incorrectly is higher. This means that Random Forest can provide more accurate results, although its fairness among groups is not as good as Naive Bayes. Based on these results, this emphasizes the importance of applying ethical and fair machine learning in disease diagnosis, including efforts to enhance equity and transparency in predictive models.

Item Type: Thesis (Undergraduate)
Student ID: 202110370311197
Keywords: Diabetes, Machine Learning, Machine Learning Fairness, Bias Detection, DALEX.
Subjects: Q Science > Q Science (General)
Divisions: Faculty of Engineering > Department of Informatics (55201)
Depositing User: 202110370311197 muhamadminannurmualim
Date Deposited: 04 Aug 2025 11:09
Last Modified: 04 Aug 2025 11:09
URI: https://eprints.umm.ac.id/id/eprint/21244

Available Versions of this Item

Actions (login required)

View Item
View Item