Implementation of Clustering Based Feature Subset Selection Algorithm for High Dimensional Data

S Natarajan, Parimala Anand, D S Shanmukh, Mohammed Saneen, Darshan W M

Abstract: Feature Selection is an essential step in successful data mining applications, which can effectively reduce data dimensionality by removing the irrelevant and redundant features. Feature Selection is often an essential data prior to applying a learning algorithm. Machine learning algorithms are known to degrade in performance when faced with many features that are not necessary for predicting the desired output. The removal of irrelevant and redundant information often improves the performance of the machine learning algorithms. Feature selection techniques aim at reducing the number of unnecessary features in classification rules. The proposed Feature Subset Selection using clustering for high dimensional data works in two steps. First step, features are divided into clusters by using partitioning clustering methods. Second step, the representative feature that is strongly related to target class is selected from each cluster to form a subset of features. These features are then used for training using a machine learning algorithm and the results are compared with the original set of features. This is followed by comparison of the efficiency and accuracy of other feature selection algorithms using various combinations of machine learning algorithms.

Keywords: Data mining, Feature Selection, Relevant features, redundant features, FAST Algorithm.

Title: Implementation of Clustering Based Feature Subset Selection Algorithm for High Dimensional Data

Author: S Natarajan, Parimala Anand, D S Shanmukh, Mohammed Saneen, Darshan W M

International Journal of Computer Science and Information Technology Research

ISSN 2348-1196 (print), ISSN 2348-120X (online)

Research Publish Journals

Vol. 3, Issue 3, July 2015 – September 2015

Citation
Share : Facebook Twitter Linked In

Citation
Implementation of Clustering Based Feature Subset Selection Algorithm for High Dimensional Data by S Natarajan, Parimala Anand, D S Shanmukh, Mohammed Saneen, Darshan W M