Current Proceedings on Technology
Yazarlar: Havva Esin Unal, Selma Ayse Ozel, Ilker Unal
Konular:-
Anahtar Kelimeler:Web page classification,Web mining,HTML tags,Accuracy
Özet: The Web is a large collection of heterogeneous documents growing daily. Related to this increased data, it is becoming difficult to effectively reach to useful information from this environment. For this purpose, an automatic Web page classification mechanism is needed to extract the documents in desired topics. In earlier studies on Web page classification, it has been concluded that using HTML tags affects classification accuracy positively. In this study, our aim is to show the effect of each HTML tag separately on classification accuracy of several classifiers. To show the effect of each tag on classification accuracy, HTML tags and terms in each tag are used as separate features. We observed that different tag sets give high classification accuracy for different datasets, however, using features extracted from anchor tags provides higher classification performance in the majority of the datasets.