Whole Genome Resequencing Management Plan


Overview

All data generated through CCGP will be archived in the NCBI SRA as part of the CCGP Umbrella Project (PRJNA720569), which comprises genus- and species-level bioprojects, and associated biosamples. Data submission to the SRA will be managed by the CCGP Bioinformatics Team.

The Sequence Read Archive (SRA) stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis. NCBI has defined two levels of organization that describe the sequence data on the SRA, BioProject and BioSample; a BioProject describes the goal of the sequencing study, whereas a BioSample records the nature of the biological material that had been sequenced. Each BioSample will be associated with a Sequence Read Run (SRR) which identifies the actual sequence data stored on the SRA. Here is an example BioProject and BioSample to get a better understanding of what they describe.

Projects that generate data outside of the CCGP Mini-Core must share raw sequence data and submit associated metadata. To share WGS data, we recommend that CCGP projects simply share the data download links provided by the UC core sequencing lab. For alternative options for data sharing, please contact Erik Enbody (eenbody@ucsc.edu).

Short read sequence data will automatically be given an embargo date of June 30, 2025. See the CCGP Data release policies for more information. Upon request, NCBI links can be provided to project PI, but otherwise data access prior to the embargo date will be restricted to the CCGP Bioinformatics Team. 

Archiving these data requires that we collect metadata from each project that describe their resequencing experiments to ensure the data are maximally impactful and analyses are reproducible. This metadata will also be used to create a BioSample record for each of the samples.