MU'ALIM, MUHAMAD MINAN NUR (2025) PENERAPAN MACHINE LEARNING FAIRNESS PADA DATA TABULAR : STUDI KASUS TERHADAP DATA MEDIS. Undergraduate thesis, Universitas Muhammadiyah Malang.

PENDAHULUAN.pdf
Download (1MB) | Preview
BAB I.pdf
Download (240kB) | Preview
BAB II.pdf
Download (442kB) | Preview
BAB III.pdf
Restricted to Registered users only
Download (467kB) | Request a copy
BAB IV.pdf
Restricted to Registered users only
Download (639kB) | Request a copy
BAB V.pdf
Restricted to Registered users only
Download (180kB) | Request a copy
LAMPIRAN.pdf
Restricted to Registered users only
Download (581kB) | Request a copy
POSTER.pdf
Restricted to Registered users only
Download (1MB) | Request a copy
Abstract
Diabetes is a chronic disease whose prevalence continues to rise, with projections indicating that the number of cases could reach millions in the future. This research analyzes the relationship between lifestyle and diabetes using the "CDC Diabetes Health Indicators" dataset through data preprocessing and transformation to optimize machine learning models. Machine Learning algorithms such as Decision Tree, Random Forest, Logistic Regression, and Naive Bayes are used to classify individuals into the categories of "no diabetes", "pre-diabetes", or "diabetes". The model's performance is evaluated using classification metrics such as accuracy, precision, recall, F1-score, AUC, and confusion matrix. Bias detection was conducted using the DALEX library with a focus on protected attributes such as age and gender. The results reveal that binary probability-based algorithm models like Gaussian Naive Bayes are considered the most fair because they treat all groups almost equally, so no particular group is overly advantaged or disadvantaged. However, this model is sometimes not as accurate as other models in predicting whether someone has diabetes or not. On the other hand, tree-based algorithm models like Random Forest are indeed slightly less fair compared to Naive Bayes, but their ability to predict correctly and incorrectly is higher. This means that Random Forest can provide more accurate results, although its fairness among groups is not as good as Naive Bayes. Based on these results, this emphasizes the importance of applying ethical and fair machine learning in disease diagnosis, including efforts to enhance equity and transparency in predictive models.
| Item Type: | Thesis (Undergraduate) |
|---|---|
| Student ID: | 202110370311197 |
| Keywords: | Diabetes, Machine Learning, Machine Learning Fairness, Bias Detection, DALEX. |
| Subjects: | Q Science > Q Science (General) |
| Divisions: | Faculty of Engineering > Department of Informatics (55201) |
| Depositing User: | 202110370311197 muhamadminannurmualim |
| Date Deposited: | 04 Aug 2025 11:09 |
| Last Modified: | 04 Aug 2025 11:09 |
| URI: | https://eprints.umm.ac.id/id/eprint/21244 |
Available Versions of this Item
- PENERAPAN MACHINE LEARNING FAIRNESS PADA DATA TABULAR : STUDI KASUS TERHADAP DATA MEDIS. (deposited 04 Aug 2025 11:09) [Currently Displayed]
