Home > Published Issues > 2024 > Volume 15, No. 5, 2024 >
JAIT 2024 Vol.15(5): 591-601
doi: 10.12720/jait.15.5.591-601

Relevant Features Independence of Heuristic Selection and Important Features of Decision Tree in the Medical Data Classification

Yusi Tyroni Mursityo 1, Irfany Rupiwardani 2, Widhy H. N. Putra 1, Dewi Sri Susanti 3, Titis Handayani 4, and Samingun Handoyo 5,6,*
1. Information System Department, Brawijaya University, Malang, Indonesia
2. Environmental Health Department, Widyagama Husada School of Health Science, Malang, Indonesia
3. Statistics Study Program, Lambung Mangkurat University, Banjarbaru, Indonesia
4. Information System Study Program, Semarang University, Semarang, Indonesia
5 Statistics Department, Brawijaya University, Malang, Indonesia
6 EECS–IGP Department, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Email: yusi_tyro@ub.ac.id (Y.T.M.); irfany@widyagamahusada.ac.id (I.R.); widhy@ub.ac.id (W.H.N.P.); ds_susanti@ulm.ac.id (D.S.S.); titis@usm.ac.id (T.H.); samistat@ub.ac.id (S.H.)
*Corresponding author

Manuscript received October 27, 2023; revised December 7, 2023; accepted January 22, 2024; published May 16, 2024.

Abstract—The input of predictive models has an important role in directing the classification model to have a satisfactory performance in predicting an unknown label of the instance class. The predictor features should be not only relevant to the target feature but also should be independent of each other. The research objective is to obtain the predictor features that are relevant and independent through feature selection using the Chi-square test, one-way Analysis of Variance (ANOVA), and Pearson’s correlation test, and to show important features of the decision tree are different from the features of relevant and independent. The evaluation of irrelevant features yields 44 of 67 relevant features which as many as 18 discrete types with two labels are dropped. The dataset with 44 relevant features is used to train the first decision tree. The relevant features mean that the target feature depends on them. The best predictor features should not only be relevant but also independent of each other. The evaluation of independent among features yields 11 of 44 independent features where the features of numeric and discrete with 2 labels are represented by 1 and three features respectively. The dataset with 11 relevant and independent features is used to train the second decision tree. The important features of both models are very different and the second model has better performance than the first one for the metrics of accuracy, recall, and F1-Score.
 
Keywords—data mining, decision tree, feature selection, important feature, relevant feature, Pearson’s correlation

Cite: Yusi Tyroni Mursityo, Irfany Rupiwardani, Widhy H. N. Putra, Dewi Sri Susanti, Titis Handayani, and Samingun Handoyo, "Relevant Features Independence of Heuristic Selection and Important Features of Decision Tree in the Medical Data Classification," Journal of Advances in Information Technology, Vol. 15, No. 5, pp. 591-601, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.