Current Proceedings on Technology

Current Proceedings on Technology

Turkish anti - spam filtering using binary and probabilistic models

Yazarlar: Semih Ergin, Efnan Sora Gunal, Huseyin Yigit, Rifat Aydin

Cilt 1 , Sayı - , 2012 , Sayfalar -

Konular:-

Anahtar Kelimeler:Spam filtering,Binary model,Probabilistic model,Bayesian classifier

Özet: In this paper, a Turkish anti-spam filtering is implemented to determine text-based Turkish spam e-mails (junk e-mail or bulk e-mail). Since e-mails are extremely easy  and cheap to send, they have gained  tremendous popularity not only as a means for  communicating with friends, but also as a medium for bombarding unsuspecting e-mail   boxes with undesired e-mails usually for advertisement. Spam e-mail is a general name used to denote these undesired e-mails.  In order to classify Turkish e-mails as spam or legitimate, firstly, a Turkish e-mail database containing examples of text-based spam and normal e-mails was constructed. Secondly, the content of each e-mail was analyzed and the different words appeared in each e-mail are found. Moreover, a stemmer subfunction has been developed, and thus the root forms of each different word were determined. The Mutual Information (MI) scores of each stem-word have been calculated so that two different types of feature vectors have been constructed according to these MI scores. After feature vector extraction, a Bayesian classifier has been used to categorize all of the e-mails (either spam or legitimate) utilizing two distinctive models which are binary and probabilistic models, respectively. In the learning (training) stage, 600 text-based Turkish e-mails (300 spam and 300 legitimate) were used while 200 Turkish e-mails (100 spam and 100 legitimate) were classified in the test phase. The two different models were individually tested, and therefore a success rate of 89% has been achieved for probabilistic model whereas the binary model has provided a success rate of 93%.  


ATIFLAR
Atıf Yapan Eserler
Henüz Atıf Yapılmamıştır

KAYNAK GÖSTER
BibTex
KOPYALA
@article{2012, title={Turkish anti - spam filtering using binary and probabilistic models}, volume={1}, number={0}, publisher={Current Proceedings on Technology }, author={Semih Ergin, Efnan Sora Gunal, Huseyin Yigit, Rifat Aydin}, year={2012} }
APA
KOPYALA
Semih Ergin, Efnan Sora Gunal, Huseyin Yigit, Rifat Aydin. (2012). Turkish anti - spam filtering using binary and probabilistic models (Vol. 1). Vol. 1. Current Proceedings on Technology .
MLA
KOPYALA
Semih Ergin, Efnan Sora Gunal, Huseyin Yigit, Rifat Aydin. Turkish Anti - Spam Filtering Using Binary and Probabilistic Models. no. 0, Current Proceedings on Technology , 2012.