CCGP reference genomes are assembled following a protocol adapted from Rhie et al. (2021). Assemblies are comprised of PacBio HiFi long read data, which is scaffolded using Omni-C (Dovetail Genomics) chromatin conformation data. Our minimum target reference genome quality is 6.7.Q40, and in most cases we expect to reach 7.C.Q50 or better (see Table 1 in Rhie et al. 2021).
Evolution of the CCGP assembly pipeline
Standard CCGP assembly pipeline
This pipeline is a general schematic and may vary slightly from assembly to assembly
Genome Assembly Software
Software | Version | |
---|---|---|
Assembly | ||
Filtering PacBio HiFi adapters | HiFiAdapterFilt | Commit 64d1c7b |
Kmer counting | Meryl | 1 |
Estimation of genome size and heterozygosity | GenomeScope | 2 |
De novo assembly (contigging) | HiFiasm | 0.13-r308 |
Long read, genome-genome alignment | Minimap2 | 2.16 |
Remove low-coverage, duplicated contigs | Purge_dups | 1.0.1 |
Scaffolding | ||
OmniC mapping for SALSA | Arima Genomics mapping pipeline | Commit 2e74ea4 |
OmniC Scaffolding | SALSA | 2 |
Gap closing | YAGCloser | Commit 20e2769 |
Hi-C contact map generation | ||
Short-read alignment | Bwa | 0.7.17-r1188 |
SAM/BAM processing | Samtools | 1.11 |
SAM/BAM filtering | pairtools | 0.3.0 |
Pairs indexing | pairix | 0.3.7 |
Matrix generation | Cooler | 0.8.10 |
Matrix balancing | hicExplorer | 3.6 |
Contact map visualization | HiGlass | 2.1.11 |
PretextMap | 0.1.4 | |
PretextView | 0.1.5 | |
PretextSnapshot | 0.0.3 | |
Organelle assembly | ||
Sequence similarity search | BLAST+ | 2.1 |
Long read alignment | Pbmm2 | 1.4.0 |
Variant calling and consensus | bcftools | 1.11-5-g9c15769 |
Extraction of sequences | seqtk | 1.3-r115-dirty |
Circular-aware long-read alignment | racon | 1.4.19 |
Sequence polishing | raptor | 0.20.3-171e0f1 |
Sequence alignment | lastz | 1.04.08 |
Gene annotation | MitoFinder | 1.4 |
Organelle annotation | GeSeq | |
Genome quality assessment | ||
Basic assembly metrics | QUAST | 5.0.2 |
Assembly completeness | BUSCO | 5.0.0 |
Merqury | 1 | |
Contamination screening | ||
General contamination screening | BlobToolKit | 2.3.3 |
The software and software versions listed above may vary slightly from assembly to assembly