Document Type : Full Research Paper

Authors

1 M.Sc. Student, Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran

2 Assistant Professor, Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran

3 Associate Professor, Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran

Abstract

DNA micro-array datasets play crucial role in machine learning and recognition of various kinds of cancer structures. Micro-array datasets are typically characterized by the high number of features and the small number of samples. Such problems may result in overfitting and low prediction accuracy of classifiers due to the irrelevant features, and therefore, they are considered as a challenging task in machine learning. The direct way to deal with such challenges is dimensionality reduction of data. In this regard, feature selection method acts as an effective solution for dimensinality reduction and increasing efficiency of learning algorithms. In this paper, by using the concept of “the basis for the DNA micro-array datasets”, a new feature selection method is introduced. To be more specific, rather than utilizing the entire micro-array dataset for tackling the problem of feature selection, a basis that is a muchmore smaller subset of the micro-array dataset is used. This method is based on subspace learning and matrix factorization.  Finally, by making use of the DNA micro-array datasets, the effectiveness of the proposed method is evaluated, and the obtained results are compared with some state-of-the-art supervised feature selection methods.

Keywords

[1]   V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, J. M. Benıtez, F. Herrera, “A review of microarray datasets and applied feature selection methods,” Inf. Sci., vol. 282, pp. 111-135, June, 2014.
[2]   M. Ebrahimpour, M. Zare, M. Eftekhari, GH. Aghamolaei, “Occam's razor in dimension reduction: Using reduced row Echelon form for finding linear independent features in high dimensional microarray datasets,” Eng. Appl. Artif. Intell., vol. 62, pp. 214-221, June, 2017.
[3]   L. Yu, H. Liu, “Feature selection for high-      dimensional data: A fast correlation-basedfilter solution,” Proc. Proceedings of the 20
th
International Conference on Machine                  Learning, Washington D.C., USA, pp.    856-863,  ICML, Aug., 2003.
[4]   Z. Zhao, H. Liu, “Searching for interacting       features in subset selection,” Intell. Data. Anal., vol. 13, pp. 207-228, Apr., 2009.
[5]   M. Hall, L. Smith,”Practical feature subset      selection for machine learning,” Proc. Proceedings of the 21st Australasian Computer Science Conference, Perth, Australia, pp. 181-191, ACSC, Feb., 1998.
[6]   I. Kononenko, “Estimating attributes: analysisand extensions of RELIEF”, in  European          conference on machine learning (ECML), Catania, Italy, 1994, pp. 171-182.
[7]   H. Peng, F. Long, C. Ding, “Feature selectionbased on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 27, pp. 1226-1238, Aug., 2005.
[8]   I. Guyon, J. Weston, S. Barnhill, V. Vapnik,         ”Gene selection for cancer classification using    support vector machines,” Mach. Learn., vol.  46, pp. 389-422, Jan., 2002.
[9]   S. Wang, W. Pedrycz, Q. Zhu, W. Zhu, “Subspace learning for unsupervised feature selection via matrix factorization,” Pattern      Recognit., vol. 48, pp. 10-19, Aug., 2014.
[10]S. Wang, W. Pedrycz, Q. Zhu, W. Zhu, “Unsupervised feature selection via maximum projection and minimum redundancy,” Knowl.-Based Syst., vol. 75, pp. 19-29, Nov., 2014.
[11]I. Jolliffe, Principal Component Analysis.  feature Springer-Verlag, 1986.
[12]S. Roweis, L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, pp. 2323-2326, Dec., 2000.
[13]J. Tenenbaum, V. De Silva, J. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, pp. 2319-2323, Dec., 2000.
[14]P. Glifani, H. Behnam, Z. Alizade Sani, “Analysis  of Echocardiography images using manifold learning,” Iran. Jour. Bio. Engin., vol. 4, pp. 149-160, Sep., 2010.
[15]J. Alcala-Fdez, L. Sanchez, S. Garcia, M. del  Jesus, S. Ventura, J. Bacardit, V. Rivas, others,   “KEEL: a software tool to assess evolutionary algorithms for data mining problems,”