Current Proceedings on Technology

Current Proceedings on Technology

A Design of Computer Recognition System of Kazakh Language Text: OCR, Morphotactics and Morphophonemics

Yazarlar: Bakyt M. Kairakbay

Cilt 3 , Sayı - , 2013 , Sayfalar -

Konular:-

Anahtar Kelimeler:Kazakh language,Nominal paradigm,Morphotactics,Morphophonemics,Two-level morphology,Optical character recognition (OCR),Tesseract,Xerox finite state tool (XFST)

Özet: In this paper we present a design of computer system for recognition of Kazakh language text. The system consists of two main components. First one is Tesseract extension OCR-module which learned to additional (comparing to Cyrillic) specific Kazakh characters. The functions of morphotactic and morphophonemic analysis of input text (output of OCR-module) on the base of formulated nominal paradigm of Kazakh language are realized in the second morphological module. This module is developed with use Xerox finite state tool (XFST). Current version of developed OCR-module provides the mistake level of Kazakh character recognition no worth than known OCR-products Abbey Fine Reader and CuneiForm. Morphological component on the basis of two-level morphology provides further correction of input text. This correction is implemented by the identification and correction all affixes and stems of input text with use of formulated Kazakh morphotactics and morphophonemics, and comparing of them with the built-in lexicon of actual Kazakh roots and affixes.


ATIFLAR
Atıf Yapan Eserler
Henüz Atıf Yapılmamıştır

KAYNAK GÖSTER
BibTex
KOPYALA
@article{2013, title={A Design of Computer Recognition System of Kazakh Language Text: OCR, Morphotactics and Morphophonemics}, volume={3}, number={0}, publisher={Current Proceedings on Technology }, author={Bakyt M. Kairakbay}, year={2013} }
APA
KOPYALA
Bakyt M. Kairakbay. (2013). A Design of Computer Recognition System of Kazakh Language Text: OCR, Morphotactics and Morphophonemics (Vol. 3). Vol. 3. Current Proceedings on Technology .
MLA
KOPYALA
Bakyt M. Kairakbay. A Design of Computer Recognition System of Kazakh Language Text: OCR, Morphotactics and Morphophonemics. no. 0, Current Proceedings on Technology , 2013.