Draft Reference Assembly Plan


Summary

Non-curated contig-level assemblies generated through our automated pipelines using HiFi and Omni-C data will be made available to CCGP researchers immediately upon completion. These draft assemblies do not constitute the final reference genome assembly, but can be utilized by researchers for preliminary analyses, protocol and pipeline development, and collaborative efforts. 

Draft Generation Method

Once sufficient HiFi data has been generated for > 30x coverage, our reference genome team will generate a draft assembly following the CCGP standard pipeline (https://github.com/ccgproject/ccgp_assembly). During this process, standard quality metrics (e.g. number of contigs, contig N50, BUSCO scores, per-base quality, among others) are generated. 

Release to PI and Notification

Upon assembly completion, CCGP will make the assembly files (hosted on CCGP servers), and the quality metrics available via password protected FTP download. Project PIs will be notified of the availability and provided with download instructions. These draft assemblies are meant as an intermediate resource for CCGP projects, and because of their incomplete nature, are not intended for submission to any public genome repository (e.g., NCBI’s Genome Database, or GenomeArk). As reference genomes are finalized with NCBI, these temporary files will be removed from our servers and PIs will be provided with permanent links to the NCBI database. 


Disclaimer on Data Quality

These are draft genomes, which will include both the primary and alternate assemblies and potentially organelles, and do not include functional annotations. Because they have not gone through CCGP’s careful curation process, including the Omni-C pipeline, they may contain mis-assemblies and/or non-target sequences.