Polyadenylation Atlas Database (06-10-2014 Release)





The Polyadenylation Atlas Database is by far the most comprehensive poly(A) site map in the model plant - Arabidopsis thaliana.  With the NIH-NIGMS grant support, the database is currently under active development in Liang lab.


As in January 31, 2013, we have collected all RNA-Seq and other transcriptomics data available in NCBI SRA and dbEST for Arabidopsis thaliana, including 1,094 data sets with a total of 1.2 terabytes (TB) in size and ~1.5x1,000,000 million nucleotide bases.   All raw sequence reads have been scanned for poly(A)/(T) tails and mapped to the reference genome (i.e., TAIR10) to identify the comprehensive poly(A) sites in genomic coordinates.


In this release, our database contains a total of 63,458,144 raw poly(A) sites, each of which is supported by an unambiguous, high-quality mRNA-to-genome alignment.   These raw poly(A) sites are grouped in terms of their closest gene locus and are categorized as being in genic regions (i.e., introns, UTRs, and CDS regions) versus intergenic regions, as well as in the sense and anti-sense strand of a particular gene.   Among all these raw poly(A) sites, 2% (i.e., 1,283,796 out of 63,458,144) have the support evidence of at least three indepdent sequence reads (or three unambiguous mRNA-to-genome alignments), and they are treated as the high-quality poly(A) sites.   Using an advanced clustering algorithm, these high-quality poly(A) sites have been clustered into distinct poly(A) site clusters (PAC), each of which is free of data redundancy (i.e., mutiple high-quality poly(A) sites share the same genomic coordinates) and micro-heterogeneity (i.e., some high-quality poly(A) sites can have the differences of 1 to a few bases in their genomic coordinates).   In Arabidopsis thaliana, we have obtained a total of 405,738 PACs finally.   We are in the process of categorizing poly(A) sites in terms of their meta information including mutant genotypes, tissues or cell types, developmental stages, growth conditions, environmental treatments and among others.


While we are building a comprehensive poly(A) database that has highly interactive web interfaces for data retrieval, search, visualization and mining, some of the web interfaces are available now for testing and exploring.  You can click any left side menu item to explore our database.  If you have any question or comment, please contact Dr. Chun Liang (liangc@miamioh.edu).