Utilizing the Genetic Algorithm to Pruning the C4.5 Decision Tree Algorithm
Keywords:Genetic algorithm, C4.5 Decision tree, Optimizing, Pruning, Machine learning
A decision tree (DTs) is one of the most popular machine learning algorithms that divide data repeatedly to form groups or classes. It is a supervised learning algorithm that can be used on discrete or continuous data for classification or regression. The most traditional classifier in this algorithm is the C4.5 decision tree, which is the point of this research. This classifier has the advantage of building a vast data set and does not stop until it reaches the desired goal. The problem with this classifier is that there are unnecessary nodes and branches leading to overfitting. This overfitting can negatively affect the classification process. In this context, the authors suggest utilizing a genetic algorithm to prune the effect of overfitting. This dataset study consists of four datasets: IRIS, Car Evaluation, GLASS, and WINE collected from UC Irvine (UCI) machine learning repository. The experimental results have confirmed the effectiveness of the genetic algorithm in pruning the effect of overfitting on the four datasets and optimizing confidence factor (CF) of the C4.5 decision tree. The proposed method has reached about 92% accuracy in this work.
Brohi S. N., Pillai T. R., Kaur S., Kaur H., Sukumaran S., and Asirvatham D., “Accuracy Comparison of Machine Learning Algorithms for Predictive Analytics in Higher Education,” In Proceedings of International Conference on Emerging Technologies in Computing (iCETiC 2019)- Springer, pp: 254-261, London, United Kingdom, 19-20 August 2019. https://doi.org/10.1007/978-3-030-23943-5_19
Sejnowski T. J., “The unreasonable effectiveness of deep learning in artificial intelligence,” Proceedings of the National Academy of Sciences of the United States of America, vol.117, no.48, pp: 30033–30038, December 2020. https://doi.org/10.1073/pnas.1907373117
Zorins A. and Grabusts P., “Artificial Neural Networks and Human Brain: Survey of Improvement Possibilities of Learning,” In Proceedings of the 10th International Scientific and Practical Conference, pp:228-231, Rēzekne, Latvia, 2015, http://dx.doi.org/10.17770/etr2015vol3.165
Pranckevičius T. and Marcinkevičius V., “Comparison of Naïve Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification,” Baltic Journal of Modern Computing, vol.5, no.2, pp:221-232, January 2017. http://dx.doi.org/10.22364/bjmc.2017.5.2.05
Holzinger A., “Introduction to Machine Learning & Knowledge Extraction (MAKE),” Machine Learning and Knowledge Extraction- MDPI, vol.1, no.1, pp:1-20, https://doi.org/10.3390/make1010001
Kersting K., “Machine Learning and Artificial Intelligence: Two Fellow Travelers on the Quest for Intelligent Behavior in Machines,” Frontiers in Big Data, Vol.1, Article 6, pp:1-4, November 2018, https://doi.org/10.3389/fdata.2018.00006
Tanuka M., “A Beginners Approach to Machine Learning Algorithms,” August 2018, Article link: https://tanukamandal.com/2018/08/16/beginners-approach-to-machine-learning-algorithms/
Fu Z., Golden B. L., Lele S., Raghavan S., Wasil E. A., “A Genetic Algorithm-Based Approach for Building Accurate Decision Trees,” INFORMS Journal on Computing, vol.15, no.1, pp:3-22, February 2003. https://doi.org/10.1287/ijoc.188.8.131.5252
Chen J., Wang X., and Zhai J., “Pruning Decision Tree Using Genetic Algorithms,” In Proceedings of International Conference on Artificial Intelligence and Computational Intelligence- IEEE, pp:1-6, Shanghai, China, 7-8 November 2009. https://doi.org/10.1109/AICI.2009.351
Jankowski D. and Jackowski K., “Evolutionary Algorithm for Decision Tree Induction,” In Proceedings of International Conference on Computer Information Systems and Industrial Management (CISIM)-Springer, pp:23-32, Ho Chi Minh City, Vietnam, November 2011. https://doi.org/10.1007/978-3-662-45237-0_4
Khanbabaei M. and Alborzi M., The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in Construction of Decision Tree Models for Credit Scoring, International Journal of Managing Information Technology, vol. 5, no.4, pp:13-31, November 2013. https://doi.org/10.5121/ijmit.2013.5402
Muslim M. A., Herowati A. J., Sugiharti E., and Prasetiyo B., “Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease,” In Proceedings of International Conference on Mathematics, Science and Education, - Journal of Physics-IOP Publishing, pp:1-9, Sayangan, Indonesia, 18-19 September 2017. https://doi.org/10.1088/1742-6596/983/1/012062
Fisher R. A., “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, vol.7, no.2, pp:179-188, September 1936. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
Forina M., Leardi R., Armanino C., and Lanteri S., “PARVUS: An extendable package of programs for data exploration, classification and correlation,” Journal of chemometrics -Elsevier, Amsterdam, ISBN: 0-444-43012-1, March 1990. https://doi.org/10.1002/cem.1180040210
Bohanec M. and Rajkovic V., “Knowledge acquisition and explanation for multi-attribute decision making,” In Proceedings of International Workshop on Expert Systems and their Applications, Avignon, France. pages 59-78, 1988.
Evett I. W. and Spiehler E. J., “Rule Induction in Forensic Science,” Book: Knowledge Based Systems-ACM Digital Library, pp:152–160, January 1989
Sporer Z., “IRIS Species Classification — Machine Learning Model,” Morioh website, June 2020, Article link: https://morioh.com/p/eafb28ccf4e3
Jazuli H., “Using Decision Tree Method for Car Selection Problem,” Medium website, March 2013, Article link: https://medium.com/machine-learning-guy/using-decision-tree-method-for-car-selection-problem-5272675451f9
Hssina B., Merbouha A., Ezzikouri H., and Erritali M., “A comparative study of decision tree ID3 and C4.5,” International Journal of Advanced Computer Science and Applications, Special Issue on Advances in Vehicular Ad Hoc Networking and Applications, pp:13-19, July 2014. https://doi.org/10.14569/SpecialIssue.2014.040203
Özsoy S., Gümüş G., and Khalilov S., “C4.5 Versus Other Decision Trees: A Review,” Computer Engineering and Applications, vol. 4, no. 3, pp:173-181, September 2015.
Tripathi M., “Understanding Decision Trees with Python,” Data science Foundation, May 2020, Article link: https://datascience.foundation/sciencewhitepaper/understanding-decision-trees-with-python
García J. M., Acosta C. A., and Mesa M. J., “Genetic algorithms for mathematical optimization,” Journal of Physics: Conference Series- IOP Publishing, pp:1-5, 2020, https://doi.org/10.1088/1742-6596/1448/1/012020
Sivanandam S., and Deepa S., “Applications of Genetic Algorithms,” Introduction to Genetic Algorithms- Springer, pp:317-402, https://doi.org/10.1007/978-3-540-73190-0_10
Mijwil, M. M. and Abttan, R. A., “Applying Genetic Algorithm to Optimization Second-Order Bandpass MGMFB Filter,” Pertanika Journal of Science and Technology, vol.28, no.4, pp. 1413–1425, October 2020. https://doi.org/10.47836/pjst.28.4.15
Gomez F., Quesada A., and Lopez R., “Genetic Algorithms for Feature Selection,” Neural Designer, Article link: https://www.neuraldesigner.com/blog/genetic_algorithms_for_feature_selection
How to Cite
Copyright (c) 2021 Maad M. Mijwil, Rana A. Abttan
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
- Papers must be submitted on the understanding that they have not been published elsewhere (except in the form of an abstract or as part of a published lecture, review, or thesis) and are not currently under consideration by another journal published by any other publisher.
- It is also the authors responsibility to ensure that the articles emanating from a particular source are submitted with the necessary approval.
- The authors warrant that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required.
- The authors ensure that all the references carefully and they are accurate in the text as well as in the list of references (and vice versa).
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Attribution-NonCommercial 4.0 International that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
- The journal/publisher is not responsible for subsequent uses of the work. It is the author's responsibility to bring an infringement action if so desired by the author.