Data Science and Applications

Data Science and Applications

Feature Selection for Comment Spam Filtering on YouTube

Yazarlar: Alper Kürşat Uysal

Cilt 1 , Sayı 1 , 2018 , Sayfalar 4-8

Konular:

Anahtar Kelimeler:-

Özet: Spam filtering is one of the most popular domains for text classification. While there exist some many studies on classification of spam e-mails and short text messages, comment spam filtering on YouTube is relatively a new topic as there are limited number of annotated datasets.  As it is valid for all text classification problems, feature space’s high dimensionality is one of the biggest problems for spam filtering due to accuracy considerations. The contribution of this study is the analysis of the performance of five state-of-the-art text feature selection methods for spam filtering on YouTube using two widely-known classifiers namely naïve Bayes (NB) and decision tree (DT). Five datasets including spam comments belonging to different subjects were utilized in the experiments. These datasets are named as Psy, KatyPerry, LMFAO, Eminem, and Shakira. For evaluation, Macro-F1 success measure were used. Also, 3-fold cross-validation is preferred for a fair performance evaluation. Experiments indicated that distinguishing feature selector (DFS) and Gini Index (GI) methods are superior to the other three feature selection methods for spam filtering on YouTube. However, the performance of DT classifier is better than NB classifier in most cases for spam filtering on YouTube.


ATIFLAR
Atıf Yapan Eserler
Henüz Atıf Yapılmamıştır

KAYNAK GÖSTER
BibTex
KOPYALA
@article{2018, title={Feature Selection for Comment Spam Filtering on YouTube}, volume={1}, number={4–8}, publisher={Data Science and Applications}, author={Alper Kürşat Uysal}, year={2018} }
APA
KOPYALA
Alper Kürşat Uysal. (2018). Feature Selection for Comment Spam Filtering on YouTube (Vol. 1). Vol. 1. Data Science and Applications.
MLA
KOPYALA
Alper Kürşat Uysal. Feature Selection for Comment Spam Filtering on YouTube. no. 4–8, Data Science and Applications, 2018.