Asian Journal of Information Technology

Abstract

Text Categorization, which consists of automatically assigning documents to a set of categories deals with the management of huge number of features. Feature selection is one of the important and frequently used techniques in data preprocessing for data mining. It removes irrelevant, redundant or noisy data and brings immediate effects for data mining applications. In this study, we propose a filter system for feature set extraction, based on the similarity distance measure. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between distance measure and four well-known classification techniques: C4.8, Multilayer perceptron, Least Mean Square and Linear Regression. The results also show that our proposed method can perform comparatively well with other classification measures, especially on a highly overlapped collection of topics and also it is found that C4.8 acts as a better classifier than other techniques.

How to cite this article:

Christy, A. and P. Thambidurai . Feature Selection for Efficient Text Categorization and Knowledge Discovery Using Classification Techniques.
DOI: https://doi.org/10.36478/ajit.2006.872.876
URL: https://www.makhillpublications.co/view-article/1682-3915/ajit.2006.872.876

Asian Journal of Information Technology

112
Views

1
Downloads

Feature Selection for Efficient Text Categorization and Knowledge Discovery Using Classification Techniques

Abstract

How to cite this article:

Asian Journal of Information Technology

112Views

1Downloads

Feature Selection for Efficient Text Categorization and Knowledge Discovery Using Classification Techniques

Abstract

How to cite this article:

112
Views

1
Downloads