Abolfazl Tabatabaei; Vali Derhami; Razieh Sheikhpour; Mohammad-Reza Pajoohan
Volume 13, Issue 4 , December 2019, , Pages 337-348
Abstract
Feature selection is a well-known preprocessing technique in machine learning, data mining and especially bioinformatics microarray analysis with a high-dimension, low-sample-size (HDLSS) data. The diagnosis of genes responsible for disease using microarray data is an important issue to promoting knowledge ...
Read More
Feature selection is a well-known preprocessing technique in machine learning, data mining and especially bioinformatics microarray analysis with a high-dimension, low-sample-size (HDLSS) data. The diagnosis of genes responsible for disease using microarray data is an important issue to promoting knowledge about the mechanism of disease and improves the way of dealing with the disease. In feature selection methods based on information theory, which cover a wide range of feature selection methods, the concept of entropy is used to define criteria for relevance, redundancy and complementarity. In this paper, we propose a new relevancy criterion based on the concept of pure continuity rather than the concept of entropy. In the proposed method, to control and reduce redundancy, the relevancy between a feature and each class is separately examined, while in most of the filter methods the value of a feature is measured based on its relation to the entire class. This solution allows us to identify the most efficient features (genes) of each class separately, while identifying common features (genes) is also possible. Discretization is another challenge in some available techniques. Using a homomorphism transformation in proposed method avoids engaging with discretization complexities, while taking advantages of it. Seven types of cancer microarrays with three types of classification models (e.g. NB, KNN and SVM) are used to establish a comparison between the proposed method and other relevant methods. The results confirm the efficiency of the proposed method in the term of accuracy and number of selected genes as two parameters of classification.
Biomedical Image Processing / Medical Image Processing
Malihe Miri; Mohammad Taghi Sadeghi; Vahid Abootalebi
Volume 8, Issue 1 , March 2014, , Pages 45-56
Abstract
Successful outcomes of Sparse Representation-based Classifier (SRC) and Sparse Subspace Clustering (SSC) in many applications motivated us to combine these methods and propose a hierarchical classifier. The main idea behind the SRC and SSC algorithms is to represent a data using a sparse linear combination ...
Read More
Successful outcomes of Sparse Representation-based Classifier (SRC) and Sparse Subspace Clustering (SSC) in many applications motivated us to combine these methods and propose a hierarchical classifier. The main idea behind the SRC and SSC algorithms is to represent a data using a sparse linear combination of elementary signals so that those elementary signals which are similar to the data contribute mainly in the representation. In this paper, the performance of a sparse representation based classifier is improved by pre-clustering of training samples using the SSC algorithm. A twostage SRC is then designed using the resulting clusters. A test data is classified by first determining the most similar cluster. The data label is subsequently found using the second stage classifier. The performance of the proposed method is evaluated considering cancer classification problem using the 14-Tumors microarray dataset. Due to low number of data samples per each class and high dimensionality of the data, this is a challenging problem. Curse of dimensionality, overfitting of the classifier to the training data and computational complexity are the possible related problems. Our experimental results show that the proposed method outperforms some other state of the art classifiers.