TY - JOUR T1 - Web Page Block Identification using Machine Learning Techniques AU - Narwal, Neetu AU - Kumar Sharma, Sanjay AU - Prakash Singh, Amit JO - International Journal of System Signal Control and Engineering Application VL - 12 IS - 4 SP - 67 EP - 73 PY - 2019 DA - 2001/08/19 SN - 1997-5422 DO - ijssceapp.2019.67.73 UR - https://makhillpublications.co/view-article.php?doi=ijssceapp.2019.67.73 KW - DOM KW -page block KW -radial basis network KW -support vector machine KW -neural network AB - Internet has gained greatest acceptance as reservoirs of information. It has been observed that the web page along with main content comprises of noise (advertisement, external links). This noise content poses difficulty for various search engines to classify the web page accurately and provides distraction to the serious user interested in gathering data related to a topic. There are various segmentation techniques that partition the web page but very few have categorized the segmented block. In this study, we tried to categorize the page blocks extracted from segmentation. We have used web page segmentation algorithm for parsing the web page and extracted important features to build a dataset. Linear and nonlinear machine learning techniques to have been used to train dataset. In this experiment we also analyzed the importance of features for the learning process. We perceived that the embedded objects from external source have highest significance for block identification. In our experiment, the non-linear radial basis neural network resulted in best performance with an accuracy of 99.89%. ER -