TY - JOUR T1 - Feature Selection for Efficient Text Categorization and Knowledge Discovery Using Classification Techniques AU - , Christy, A. AU - , P. Thambidurai JO - Asian Journal of Information Technology VL - 5 IS - 8 SP - 872 EP - 876 PY - 2006 DA - 2001/08/19 SN - 1682-3915 DO - ajit.2006.872.876 UR - https://makhillpublications.co/view-article.php?doi=ajit.2006.872.876 KW - Feature set extraction KW -filter KW -C4.8 KW -precision KW -recall KW -information gain KW -etc AB - Text Categorization, which consists of automatically assigning documents to a set of categories deals with the management of huge number of features. Feature selection is one of the important and frequently used techniques in data preprocessing for data mining. It removes irrelevant, redundant or noisy data and brings immediate effects for data mining applications. In this study, we propose a filter system for feature set extraction, based on the similarity distance measure. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between distance measure and four well-known classification techniques: C4.8, Multilayer perceptron, Least Mean Square and Linear Regression. The results also show that our proposed method can perform comparatively well with other classification measures, especially on a highly overlapped collection of topics and also it is found that C4.8 acts as a better classifier than other techniques. ER -