Veri Bilimi

Veri Bilimi

Mikrobiyota Verileri İçin Boyut İndirgemede Yeni Bir Yaklaşım

Yazarlar: Handan ANKARALI, Süleyman YILDIRIM, Nurgül BULUT

Cilt 4 , Sayı 1 , 2021 , Sayfalar 23 - 30

Konular:Fen

Anahtar Kelimeler:Zero-inflated models,Frequency data,Classification and Regression tree,Variable Screening algorithm,Microbiota,Parkinson’s disease

Özet: Microorganisms associated with human skin, nasopharyngeal and oral cavities, vaginal tract, and gastrointestinal system make up the human microbiota. It is highly effective on the physiological, metabolic and immune system and has been shown to be associated with many diseases. Recent advances in DNA sequencing technology have facilitated profiling of these microbial communities through high throughput sequencing of amplicons of the marker genes such as 16S rRNA for bacteria, 18S rRNA or ITS. Data generated from such sequencing efforts are preprocessed into composition or relative abundance that are often presented in species abundance (OTU/ASV) tables. The data obtained consists of the frequency of microbiota species in very large numbers and it contains a large amount of zero values. Nonetheless, the high dimensional data in such tables must be treated with dimension reduction techniques to draw sensible conclusions from the data. In the statistical literature, this process is called dimension reduction or variable selection. The aim in this study is to propose a novel approach to reduce dimensions in high dimensional and inherently zero inflated and frequency character microbiota data. For this purpose, univariate tests, a zero-inflated negative binomial model, classification and regression trees, and a feature selection and variable screening algorithm were used. Using these four methods enabled us to select most important features of the microbiota dataset for the subsequent downstream analyses. We tested the above approach on our recent microbiota dataset we generated from stool samples of Parkinson’s disease patients cohort. Of 199 bacteria genera our approach enabled us to select 19 candidate biomarker genera, which are often implicated in serving critical metabolic activities in human body such as production of short-chain fatty acids. To assess the potential of these candidate biomarkers in differentiating disease and healthy states we developed a multiple logistic regression model and further selected their biomarker potential in a stepwise variable screening. Big data analysis necessarily entails use of increasingly more sophisticated and combinatorial modalities. Here we successfully demonstrated that hitherto untested combinatorial use of feature selection methods enables more useful predictive models. Similar approaches can be tried with different methods and used on different data types.


ATIFLAR
Atıf Yapan Eserler
Henüz Atıf Yapılmamıştır

KAYNAK GÖSTER
BibTex
KOPYALA
@article{2021, title={Mikrobiyota Verileri İçin Boyut İndirgemede Yeni Bir Yaklaşım}, volume={4}, number={23–30}, publisher={Veri Bilimi}, author={Handan ANKARALI,Süleyman YILDIRIM,Nurgül BULUT}, year={2021} }
APA
KOPYALA
Handan ANKARALI,Süleyman YILDIRIM,Nurgül BULUT. (2021). Mikrobiyota Verileri İçin Boyut İndirgemede Yeni Bir Yaklaşım (Vol. 4). Vol. 4. Veri Bilimi.
MLA
KOPYALA
Handan ANKARALI,Süleyman YILDIRIM,Nurgül BULUT. Mikrobiyota Verileri İçin Boyut İndirgemede Yeni Bir Yaklaşım. no. 23–30, Veri Bilimi, 2021.