Predicting Yeast Chromatin Accessibility Based on DNA Sequence Features
Keywords:
machine learning, chromatin accessibility, kmer, ATAC-seqAbstract
The relationship between chromatin accessibility regions and DNA sequences represents a significant yet underexplored area of research. Supervised machine learning has emerged as an effective approach to elucidate this relationship. Most current predictions have focused on non-yeast organisms; however, in the field of synthetic biology, chromatin accessibility directly influences chromatin structure and the binding potential of regulatory proteins, which is crucial for enhancing production efficiency. In this study, we utilized ATAC-seq data from public databases specific to yeast. By combining the k-mer features of sequences from accessible regions with ensemble algorithm classifiers, we developed a predictive model for chromatin accessibility. Our model achieved an impressive AUC of 0.99, which holds promise for uncovering deeper insights into the mechanisms linking chromatin structure and DNA sequences.