Prunus persica (Peach)



About the genome:


Overview

Early Online Access to the Assembled Peach Genome for Browsing and BLASTing

The International Peach Genome Initiative (IPGI) would like to welcome you to the early online access to the draft assembled and annotated peach genome (peach v1.0). Rest assured, despite the fact that the genome is being made available on April 1, 2010, this is no joke! Before we talk more about the genome itself, let's do some housekeeping regarding data access.

As a public service, and in agreement with the Fort Lauderdale agreement, the peach genome is being made available by IPGI prior to peer-reviewed publication of the data. IPGI and its partners are making this data available with the expectation and desire to publish this data in a reasonable time without preemption by other groups. By accessing these data, you agree not to publish any articles containing analyses of genes or genomic data on a whole genome or chromosome scale prior to publication by IPGI and/or its collaborators of a comprehensive genome analysis ("Reserved Analyses"). "Reserved analyses" include the identification of complete (whole genome) sets of genomic features such as genes, gene families, regulatory elements, repeat structures, GC content, or any other genome feature, and whole-genome- or chromosome- scale comparisons with other species. If you are interested in collaboration on one of these topics involving the peach genome, please contact one of the project coordinators. Work towards the publication of the peach genome is underway, and we plan to submit a manuscript in the coming months. If you will be employing the data for non-reserved analyses, such as cloning a gene of interest or to analyze a gene family etc., please feel free to do so, we only ask that you reference the International Peach Genome Initiative as your citation.

One more disclaimer - peach v1.0 represents an initial draft of the assembled genome. While we believe peach v1.0 is a very high quality plant genome, we are aware that it contains both known and unknown errors and discrepancies that will be addressed in upcoming releases of the genome. For instance, we are aware of a few minor situations where the sequences have been correctly assigned to a location, but the orientation is in question. We hope and believe that any problems that arise from these discrepancies are compensated by the rapid release of the data. If you believe that you have identified a discrepancy in the data, please contact IPGI at and we will be sure to address your concerns in an upcoming release.

History
Now, back to the interesting part - questions and answers regarding peach v1.0 and how we got to this point. At the Plant and Animal Genome XV Meeting on 01/16/07, Jerry Tuskan from the Joint Genome Institute (JGI) announced plans to sequence the peach genome. Since then, an international consortium (IPGI) coalesced to do the work cooperatively. This consortium, under the direction of Drs Bryon Sosinski, Ignazio Verde and Daniel Rokhsar, includes numerous researchers from countries around the globe including the US, Italy (Drupomics), Spain and Chile. The specific roles of the participants will be outlined in the publication of the peach genome.
Background
Peach (Prunus persica) is considered one of the genetically most well characterized species in the Rosaceae, and it has distinct advantages that make it suitable as a model genome species for Prunus as well as for other species in the Rosaceae. While some Prunus species, such as cultivated plums and sour cherries, are polyploid, peach is a diploid with n = 8 and has a comparatively small genome currently estimated to be ~220-230 Mbp based upon the peach v1.0 assembly. Peach has a relatively short juvenility period of 2-3 years compared to most other fruit tree species that require 6-10 years. In addition, a number of genes for fundamentally important traits have been genetically described in peach, including genes controlling flower and fruit development, tree growth habit, dormancy, cold hardiness, and disease and pest resistance.
Genome facts and statistics
Peach v1.0 was generated from DNA from the doubled haploid cultivar 'Lovell' which means that the genes and intervening DNA is "fixed" or identical for all alleles and both chromosomal copies of the genome. This doubled haploid nature was confirmed by the evaluation of >200 SSRs, and has facilitated a highly accurate and consistent assembly of the peach genome.

Peach v1.0 currently consists of 8 pseudomolecules (scaffolds) representing the 8 chromosomes of peach, and are numbered according to their corresponding linkage groups. The genome sequencing consisted of approximately 7.7 fold whole genome shotgun sequencing employing the accurate Sanger methodology, and was assembled using Arachne. The assembled peach scaffolds cover nearly 99% of the peach genome, with over 92% having confirmed orientation. To further validate the quality of the assembly, 74,757 Prunus ESTs were queried against the genome at 90% identity and 85% coverage, and we found that only ~2% were missing. This is truly a high quality genome! Gene prediction and annotation, is an ongoing process that may take years to complete, but current estimates indicate that peach has a typical plant gene repertoire of approximately 35,000 protein coding sequences.

Peach genome browsers are available at JGI and the Genome Database for Rosaceae, while the Italian version is hosted at the Istituto di Genomica Applicata (IGA). Access to the raw sequence data is provided via the Download Data button at the top of this page.

Once again, welcome to peach v1.0!

On behalf of IPGI and its collaborators,

Bryon Sosinski, NC State University (sosinski AT ncsu.edu)[B1]
Ignazio Verde, Consiglio per la Ricerca e la Sperimentazione in Agricoltura (ignazio.verde AT entecra.it)
Daniel Rokhsar, DOE Joint Genome Institute (dsrokhsar AT gmail.com)

Annotation

Transcript assemblies were constructed using PASA from Prunus persica ESTs (~88K) and ESTs of related species (~424K). Loci were determined by BLAT alignments of above transcript assemblies and/or BLASTX alignments of peptides from arabi (Arabidopsis thaliana), rice, soybean, grape and poplar peptides to repeat-soft-masked P. persica genome. Gene models were predicated by homology-based predictors, mainly by FGENESH+ with the addition of GenomeScan if FGENESH+ produced no model at the locus. Predicted genes were UTR-extended and/or improved by PASA. Final gene set was made from gene selection based on ESTs support or peptide homology support subjected to filtering of repeats/transposable elements.

Statistics

JGI release v1.0


Genome Size
Approximately 227.3 Mb arranged in 202 scaffolds
Approximately 224.6 Mb arranged in 2730 contigs (~ 1.2% gap)
Scaffold N50 (L50) = 4 (26.8 Mbp)
Contig N50 (L5) = 294 (214.2 Kbp)
21 scaffolds larger than 50 Kbp, with 99.4% of the genome in scaffolds larger than 50 Kbp
Loci
27852 loci containing protein-coding genes
Transcripts
28689 protein-coding transcripts
©2010 University of California Regents. All rights reserved