CCGP
CCGP
California Conservation Genomics Project: Building the most comprehensive genomic dataset ever assembled to help manage regional biodiversity.

CCGP Genome Assemblies

unsplash-image-zS4lUqLEiNA.jpg

CCGP reference genomes are assembled following a protocol adapted from Rhie et al. (2021). Assemblies are comprised of PacBio HiFi long read data, which is scaffolded using Omni-C (Dovetail Genomics) chromatin conformation data. Our minimum target reference genome quality is 6.7.Q40, and in most cases we expect to reach 7.C.Q50 or better (see Table 1 in Rhie et al. 2021). 


Evolution of the CCGP assembly pipeline

 

Standard CCGP assembly pipeline

This pipeline is a general schematic and may vary slightly from assembly to assembly

 

Genome Assembly Software

Software Version
Assembly
Filtering PacBio HiFi adapters HiFiAdapterFilt Commit 64d1c7b
Kmer counting Meryl 1
Estimation of genome size and heterozygosity GenomeScope 2
De novo assembly (contigging) HiFiasm 0.13-r308
Long read, genome-genome alignment Minimap2 2.16
Remove low-coverage, duplicated contigs Purge_dups 1.0.1
Scaffolding
OmniC mapping for SALSA Arima Genomics mapping pipeline Commit 2e74ea4
OmniC Scaffolding SALSA 2
Gap closing YAGCloser Commit 20e2769
Hi-C contact map generation
Short-read alignment Bwa 0.7.17-r1188
SAM/BAM processing Samtools 1.11
SAM/BAM filtering pairtools 0.3.0
Pairs indexing pairix 0.3.7
Matrix generation Cooler 0.8.10
Matrix balancing hicExplorer 3.6
Contact map visualization HiGlass 2.1.11
PretextMap 0.1.4
PretextView 0.1.5
PretextSnapshot 0.0.3
Organelle assembly
Sequence similarity search BLAST+ 2.1
Long read alignment Pbmm2 1.4.0
Variant calling and consensus bcftools 1.11-5-g9c15769
Extraction of sequences seqtk 1.3-r115-dirty   
Circular-aware long-read alignment racon 1.4.19
Sequence polishing raptor 0.20.3-171e0f1
Sequence alignment lastz 1.04.08
Gene annotation MitoFinder 1.4
Organelle annotation GeSeq
Genome quality assessment
Basic assembly metrics QUAST 5.0.2
Assembly completeness BUSCO 5.0.0
Merqury 1
Contamination screening
General contamination screening BlobToolKit 2.3.3   

The software and software versions listed above may vary slightly from assembly to assembly