Overview
With a genome of just over 500 million letters of genetic code, Populus trichocarpa was sequenced eight times over to attain the highest quality standards. Poplar was chosen as the first tree DNA sequence decoded because of its relatively compact genetic complement, some 50 times smaller than the genome of pine, making the poplar an ideal model system for trees.
The poplar genome, divided into 19 chromosomes, is four times larger than the genome of the first plant sequenced four years ago, Arabidopsis thaliana. Analysis of the assembled genome reveals a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Poplar genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages (from JGI - The Joint Genome Insitute and Tuskan, et.al.).
Assembly
We constructed the v2 Populus genome assembly with Arachne version 20071016HA with an attempt to merge the outbred haplotypes and an extensive attempt to remove contaminating sequence. We also integrated the latest genetic mapping information to construct the 19 chromosome size scaffolds which contain 370Mb of sequence, a majority of the assembled poplar sequence. The first 19 scaffolds from the assembly correspond to the poplar chromosomes. The full release covers 403 Mb pairs of sequence with an average read depth of 7.45x assembled.
Annotation
Transcript assemblies were constructed using PASA from Populus trichocarpa ESTs/mRNAs and ESTs/mRNAs of other Poplar species including >2.6M 454-sequenced Populus deltoides EST reads generated at JGI. Loci were determined by BLAT alignments of above transcript assemblies and/or BLASTX alignments of peptides from arabi (Arabidopsis thaliana), rice, soybean or grape genomes to repeat-soft-masked P. trichocarpa genome. Gene models were predicated by homology-based predictors, mainly FGENESH+ with the addition of GenomeScan if FGENESH+ produced no model at the locus. Predicted genes were UTR-extended and/or improved by PASA. Final gene set was made from gene selection based on ESTs/mRNAs support or peptide homology support subjected to filtering of repeats/transposable elements.
As much as possible, manual annotations (assignment of symbols, deflines, etc.) of v1.1 annotations were mapped forward onto the v2.0 gene set as follows: Version 1.1 annotated genes were aligned to the predicted Version 2.0 genes by BLAT with default parameters. Version 2.0 sequences with CDS to CDS BLAT results with >90% identity and >80% coverage and gene locus to locus BLAT results with >90% identity and >90% coverage were selected for further analysis. Reciprocal BLAT of this dataset against the set of all Version 1.1 annotated genes was performed to ensure that only mutual best hits were considered for annotation mapping. Version 1.1 symbol names, and defline annotations, if any, that matched this criteria were mapped onto corresponding Version 2.0 sequences. Out of 20362 v1.1 models that have a defline, a little more than half (11,725) had their deflines and symbols successfully assigned to a v2 model, according to our quite stringent criteria (CDS-to-CDS BLAT identity >90%, coverage >80%, and locus-to-locus BLAT >90% identity, >90% coverage).
In annotation v2.2, split genes in v2.0 were merged after intron length cutoff was adjusted in our automatic gene annotation pipeline.
Statistics
This release of Phytozome includes the JGI v2.2 gene annotation of assembly v2.- Genome
- Approximately 403Mb arranged in 19 chromosomes, assembled into 2518 scaffolds
- Loci
- 40668 loci containing protein-coding transcripts
- Transcripts
- 45033 protein-coding transcripts


