روش انتخاب ویژگی بر اساس یادگیری زیرفضا و تجزیه‌ی ماتریس پایه برای داده‌های میکرو-آرایه‌ای DNA

ده‌تقی‌زاده, مهلا; صابری موحد, فرید; افتخاری, مهدی

doi:10.22041/ijbme.2019.104143.1454

نوع مقاله : مقاله کامل پژوهشی

نویسندگان

¹ دانشجوی کارشناسی ارشد، گروه ریاضی کاربردی، دانشکده‌ی علوم و فناوری‌های نوین، دانشگاه تحصیلات تکمیلی صنعتی و فناوری پیشرفته، کرمان، ایران

² استادیار، گروه ریاضی کاربردی، دانشکده‌ی علوم و فناوری‌های نوین، دانشگاه تحصیلات تکمیلی صنعتی و فناوری پیشرفته، کرمان، ایران

³ دانشیار، دانشکده‌ی مهندسی کامپیوتر، دانشگاه شهید باهنر کرمان، کرمان، ایران

https://doi.org/10.22041/ijbme.2019.104143.1454

چکیده

داده‌های میکرو-آرایه‌ای DNA در یادگیری ماشین و تشخیص انواع مختلف ساختارهای سرطانی نقش مهمی را ایفا می‌کنند. داده‌های میکرو-آرایه‌ای به طور معمول شامل تعداد زیادی ویژگی و تعداد اندکی نمونه هستند. هم‌چنین، این‌گونه داده‌ها به دلیل داشتن برخی ویژگیهای نامرتبط میتوانند موجب بیشبرازش و کاهش دقت پیشبینی طبقه‌بند کننده‌ها شوند. بنابراین، آنالیز داده‌های میکرو-آرایه‌ای امری مهم و چالش برانگیز در یادگیری ماشین و فناوری ژنتیک مولکولی محسوب می‌شود. یک راه مستقیم برای مقابله با این چالش، کاهش بعد داده می‌باشد. روش انتخاب ویژگی به عنوان یک راه‌کار مهم برای کاهش ابعاد و افزایش کارایی الگوریتم‌های یادگیری عمل می‌کند. در این مقاله، با استفاده از مفهوم پایه برای مجموعه‌ی داده‌های میکرو-آرایه‌ای، یک روش جدید انتخاب ویژگی معرفی شده است. به عبارت دیگر، از یک پایه شامل یک زیرمجموعه‌ی بسیار کوچک از ژن‌ها، به جای کل مجموعه‌ی داده‌های میکرو-آرایه‌ای در تعریف مساله‌ی انتخاب ویژگی استفاده شده است. در این روش مساله‌ی انتخاب ویژگی بر اساس دیدگاه یادگیری زیرفضا و تجزیه‌ی ماتریس پایه فرمولبندی شده است. در نهایت، با استفاده از مجموعه‌ی داده‌های میکرو-آرایه‌ای DNA، کارایی روش پیشنهادی بررسی شده و نتایج به دست آمده با نتایج چند روش انتخاب ویژگی معتبر مقایسه شده است.

کلیدواژه‌ها

عنوان مقاله [English]

Feature Selection Method based on Subspace Learning and Factorization of Basis Matrix for DNA Micro-Array Datasetsfor DNA Micro-Array Datasets

نویسندگان [English]

Mahla Dehtaghi Zadeh ¹
Farid Saberi-Movahed ²
Mahdi Eftekhari ³

¹ M.Sc. Student, Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran

² Assistant Professor, Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran

³ Associate Professor, Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran

چکیده [English]

DNA micro-array datasets play crucial role in machine learning and recognition of various kinds of cancer structures. Micro-array datasets are typically characterized by the high number of features and the small number of samples. Such problems may result in overfitting and low prediction accuracy of classifiers due to the irrelevant features, and therefore, they are considered as a challenging task in machine learning. The direct way to deal with such challenges is dimensionality reduction of data. In this regard, feature selection method acts as an effective solution for dimensinality reduction and increasing efficiency of learning algorithms. In this paper, by using the concept of “the basis for the DNA micro-array datasets”, a new feature selection method is introduced. To be more specific, rather than utilizing the entire micro-array dataset for tackling the problem of feature selection, a basis that is a muchmore smaller subset of the micro-array dataset is used. This method is based on subspace learning and matrix factorization. Finally, by making use of the DNA micro-array datasets, the effectiveness of the proposed method is evaluated, and the obtained results are compared with some state-of-the-art supervised feature selection methods.

کلیدواژه‌ها [English]

Feature Selection
Subspace Learning
Matrix Factorization
DNA Micro-Array Datasets

مراجع

[1] V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, J. M. Benıtez, F. Herrera, “A review of microarray datasets and applied feature selection methods,” Inf. Sci., vol. 282, pp. 111-135, June, 2014.

[2] M. Ebrahimpour, M. Zare, M. Eftekhari, GH. Aghamolaei, “Occam's razor in dimension reduction: Using reduced row Echelon form for finding linear independent features in high dimensional microarray datasets,” Eng. Appl. Artif. Intell., vol. 62, pp. 214-221, June, 2017.

[3] L. Yu, H. Liu, “Feature selection for high- dimensional data: A fast correlation-basedfilter solution,” Proc. Proceedings of the 20

^th

International Conference on Machine Learning, Washington D.C., USA, pp. 856-863, ICML, Aug., 2003.

[4] Z. Zhao, H. Liu, “Searching for interacting features in subset selection,” Intell. Data. Anal., vol. 13, pp. 207-228, Apr., 2009.

[5] M. Hall, L. Smith,”Practical feature subset selection for machine learning,” Proc. Proceedings of the 21st Australasian Computer Science Conference, Perth, Australia, pp. 181-191, ACSC, Feb., 1998.

[6] I. Kononenko, “Estimating attributes: analysisand extensions of RELIEF”, in European conference on machine learning (ECML), Catania, Italy, 1994, pp. 171-182.

[7] H. Peng, F. Long, C. Ding, “Feature selectionbased on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 27, pp. 1226-1238, Aug., 2005.

[8] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, ”Gene selection for cancer classification using support vector machines,” Mach. Learn., vol. 46, pp. 389-422, Jan., 2002.

[9] S. Wang, W. Pedrycz, Q. Zhu, W. Zhu, “Subspace learning for unsupervised feature selection via matrix factorization,” Pattern Recognit., vol. 48, pp. 10-19, Aug., 2014.

[10]S. Wang, W. Pedrycz, Q. Zhu, W. Zhu, “Unsupervised feature selection via maximum projection and minimum redundancy,” Knowl.-Based Syst., vol. 75, pp. 19-29, Nov., 2014.

[11]I. Jolliffe, Principal Component Analysis. feature Springer-Verlag, 1986.

[12]S. Roweis, L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, pp. 2323-2326, Dec., 2000.

[13]J. Tenenbaum, V. De Silva, J. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, pp. 2319-2323, Dec., 2000.

[14]P. Glifani, H. Behnam, Z. Alizade Sani, “Analysis of Echocardiography images using manifold learning,” Iran. Jour. Bio. Engin., vol. 4, pp. 149-160, Sep., 2010.

[15]J. Alcala-Fdez, L. Sanchez, S. Garcia, M. del Jesus, S. Ventura, J. Bacardit, V. Rivas, others, “KEEL: a software tool to assess evolutionary algorithms for data mining problems,”

نشریه‌ی علمی مهندسی پزشکی زیستی

روش انتخاب ویژگی بر اساس یادگیری زیرفضا و تجزیه‌ی ماتریس پایه برای داده‌های میکرو-آرایه‌ای DNA

مراجع

مراجع

دوره 13، شماره 3
مهر 1398
صفحه 223-234

روش انتخاب ویژگی بر اساس یادگیری زیرفضا و تجزیه‌ی ماتریس پایه برای داده‌های میکرو-آرایه‌ای DNA

مراجع

مراجع

دوره 13، شماره 3مهر 1398صفحه 223-234

دوره 13، شماره 3
مهر 1398
صفحه 223-234