KOMPARASI METODE KOMBINASI SELEKSI FITUR DAN MACHINE LEARNING K-NEAREST NEIGHBOR PADA DATASET LABEL HOURS SOFTWARE EFFORT ESTIMATION

Indra Kurniawan, Ahmad Faiq Abror

Abstract


The methods for Software Effort Estimation are divided into two, these methods are grouped into Non Machine Learning (non-ML) and Machine Learning (ML) methods [1]. The k-NN method has the disadvantage of being unable to tolerate irrelevant features and greatly affect the accuracy of k-NN. The k-NN method is also difficult to deal with missing data problems and feature categorization problems such as features that are not relevant, weight features that are not optimal, and the same features [2]. Whereas the dataset of Software Effort Estimation still has some serious challenges such as the characteristics of the data set, which are irrelevant features and the level of influence of each feature in the estimated data of the software effort [3]. This study compared the k-NN individual method with the combination of feature selection method with k-NN to find out which method was the best. The results showed that the Forward Selection (FS) method and Median Weighted Information Gain with k-Nearest Neighbor can overcome the problem of irrelevant features so as to increase accuracy in the RMSE Software Effort Estimation dataset, which is smaller in the Albrecht dataset of 5,953 using the Median method -WIG k-NN, the Miyazaki dataset is 55,421 and Kemerer is 123,081 using the FS k-NN method. The combination of kNN with Feature Selection is proven to be able to improve the estimation results better than kNN individuals. With the FS k-NN method being the best by winning in 2 datasets Miyazaki and Kemerer.

 


Keywords


Software Effort Estimation, irrelevant feature, Forward Selection (FS), Median Weighted Information Gain, k-Nearnest Neighbor

Full Text:

PDF

References


M. Shepperd and S. Macdonell, “Evaluating prediction systems in software project estimation,” Inf. Softw. Technol., vol. 54, no. 8, pp. 820–827, 2012.

J. Wen, S. Li, Z. Lin, Y. Hu, and C. Huang, “Systematic literature review of machine learning based software development effort estimation models,” Inf. Softw. Technol., vol. 54, no. 1, pp. 41–59, 2012.

A. Idri and A. Abran, “Analogy-based software development effort estimation : A systematic mapping and review,” Inf. Softw. Technol., 2014.

I. Sommerville, Software Engineering Ninth Edition, 9th ed. Boston: PEARSON, 2011.

A. Trendowicz and R. Jeffery, Software Project Effort Estimation. New York: Springer, 2014.

X. Huang, D. Ho, J. Ren, and L. F. Capretz, “Improving the COCOMO model using a neuro-fuzzy approach,” Appl. Soft Comput., vol. 7, pp. 29–40, 2007.

S. Grimstad and M. Jørgensen, “Inconsistency of expert judgment-based estimates of software development effort,”J. Syst. Softw., vol. 80, pp. 1770–1777, 2007.

G. R. Finnie and G. E. Wittig, “An Comparison of Software Effort Estimation Techniques : Using Function Points with Neural Networks , Case-Based Reasoning and Regression Models,” J. Systens Softw., vol. 1212, no. 97, pp. 281–289, 1997.

S. M. Satapathy, B. P. Acharya, and S. K. Rath, “Early stage software effort estimation using random forest technique based on use case points,” IET Softw., vol.10, no. 1, pp. 10–17, 2016.

E. U. Points, N. J. Nunes, and L. Constantine, “i UCP : Software Project Size with,” IEEE Softw., pp. 64–73, 2011.

E. K. Adhitya, R. Satria, and H. Subagyo, “Komparasi Metode Machine Learning dan Non Machine Learning untuk Estimasi Usaha Perangkat Lunak,” J. Softw. Eng.,

vol. 1, no. 2, pp. 109–113, 2015.

A. Bakır, B. Turhan, and A. Bener, “A comparative study for estimating software development effort intervals,” Softw. Qual J, vol. 19, pp. 537–552, 2011.

V. Khatibi, B. Dayang, and N. Abang, “A PSO-based model to increase the accuracy of software development effort estimation,” Softw. Qual J, vol. 21, pp. 501–526, 2013.

Q. Liu, J. Xiao, and H. Zhu, “Feature selection for software effort estimation with localized neighborhood mutual information,” Cluster Comput., no. 1, 2018.

V. S. Dave, “Comparison of Regression model , Feed-forward Neural Network and Radial Basis Neural Network for Software Development Effort Estimation,” ACM SIGSOFT Softw. Eng. Notes, vol. 36, no. 5, pp. 1–5, 2011.

C. López-martín, “Predictive accuracy comparison between neural networks and statistical regression for development effort of software projects,” Appl. Soft Comput. J., pp. 1–16, 2014.

A. L. I. Oliveira, P. L. Braga, R. M. F. Lima, and M. L. Cornélio, “GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation,” Inf. Softw. Technol., vol. 52, no. 11, pp. 1155–1166, 2010.

J. Shivhare and S. K. Rath, “Software Effort Estimation using Machine Learning Techniques,” ISEC, 2014.

F. Zare, H. Khademi Zare, and M. S. Fallahnezhad, “Software effort estimation based on the optimal Bayesian belief network,” Appl. Soft Comput., vol. 49, pp. 968–980, 2016.

R. Malhotra, A. Kur, and Y. Sigh, “Application of Machine Learning Methods for Software Effort Prediction,” ACM SIGSOFT Softw. Eng. Notes, vol. 35, no. 3, pp. 1–6, 2010.

S. G. Macdonell and M. J. Shepperd, “Combining techniques to optimize effort predictions in software project management,” J. Syst. Softw. 66, vol. 66, pp. 91–98, 2003.

G. Chandrashekar and F. Sahin, “A survey on feature selection methods q,” Comput. Electr. Eng., vol. 40, no. 1, pp. 16–28, 2014.

M. Kabir and K. Murase, “A new hybrid ant colony optimization algorithm for feature selection,” Expert Syst. Appl., vol. 39, no. 3, pp. 3747–3763, 2012.

H. Liu, S. Member, L. Yu, and S. Member, “Algorithms for Classification and Clustering,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 491–502, 2005.

A. Idri and S. Cherradi, “Improving Effort Estimation of Fuzzy Analogy using Feature Subset Selection,” 2016.

R. S. Pressman, Software Engineering, 7th ed. Boston: McGraw-Hill, 2010.

J. Han, M. Kamber, and J. Pei, Data Mining : Concepts and Techniques, 3rd ed. Waltham: Elsevier Inc., 2012.

I. H. Witten, E. Frank, and M. A. Hall, Data Mining : Practical Machine Learning Tools and Techniques, 3 rd. Burlington,: Elsevier, 2011.

M. Kaya, “Classification of Pancreas Tumor Dataset Using Adaptive Weighted k Nearest Neighbor Algorithm,” IEEE, pp. 0–4, 2014.




DOI: http://dx.doi.org/10.36448/jsit.v10i2.1314

Refbacks

  • There are currently no refbacks.


About the JournalJournal PoliciesAuthor Information

Explore: Jurnal Sistem Informasi dan Telematika (Telekomunikasi, Multimedia dan Informatika)
e-ISSN: 2686-181X
Website: http://jurnal.ubl.ac.id/index.php/explore
Email: explore@ubl.ac.id
Published by: Pusat Studi Teknologi Informasi, Fakultas Ilmu Komputer, Universitas Bandar Lampung
Office: Jalan Zainal Abidin Pagar Alam No 89, Gedong Meneng, Bandar Lampung, Indonesia

This work is licensed under a Creative Commons Attribution 4.0 International License
Technical Support by:  RYE Education Hub