Whole Genome Resequencing Data Ingest
If you have CCGP sequencing data that you generated (i.e., that was not generated by the CCGP Mini-Core at UCLA), please follow the instructions below to submit the sequence data and metadata associated with your files.
Sequence Data
To share WGS data, you must share the data download links provided by the UC core sequencing lab with data wrangler Erik Enbody. These expire 1 month after data delivery so make sure that you are prompt on sending this link. Failure to do so will result in difficulty and potentially an additional cost for sharing the data with the bioinformatics team. Please contact Erik Enbody if you have any questions.
Metadata
Please fill in this spreadsheet accurately and upload it to the form below. Please take the time to fill out this form accurately. Accurate data submission reduces burden on CCGP team members and ensures that the data will be usable in the future. All data provided here will be uploaded to NCBI along with sequence data.
We will only accept metadata submission in xlsx format
For specifics on what each column should contain reference these instructions:
ALL COLUMNS ARE REQUIRED
sample_name - Sample name is a name that you choose for the sample. NOTE THIS WILL BE THE NAME USED IN THE FINAL GENOTYPE (VCF) FILE. A valid sample name may ONLY contain the letters, numbers, and underscores. We suggest that you make it concise, unique and consistent within your lab, and as informative as possible. Every sample name from a single submitter must be unique. (example: UCD_YWAR_19234).
sample_title - Title of the sample. This can have any format, but it must be unique (can not be shared by any other sample in the column submission sheet). (example: WGS_YWAR_19234). This will only be used by NCBI and not during data analysis.
organism - Species and genus are required. (example: Ursus americanus)
isolate - for the CCGP, use this column to describe either the population (e.g. “white sands”) or subspecies or other grouping beyond species that categories this sample. If not applicable, enter “NA”. If missing, enter “missing”
sex - The sex of the sampled organism. (example: female, male, NA)
tissue - Type of tissue the sample was taken from. (example: frozen tissue, blood). If not applicable, enter “NA”. If missing, enter “missing”
geo_loc_name - Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location. (example: "Canada: Vancouver" or "Germany: halfway down Zugspitze, Alps")
latitude - Positive latitudes are north of the equator, negative latitudes are south of the equator. (example: 40.97261). Maximize precision and double check for accuracy.
longitude - Positive longitudes are east of the Prime Meridian; negative longitudes are west of the Prime Meridian. (example: -121.7604). Maximize precision and double check for accuracy.
collection_date - example: 2022-11-15. Year is fine. Acceptable formats: “yyyy-mm-dd”, “yyyy-mm”, or “yyyy”.
specimen_voucher - Identifier for the physical specimen. Use format: "[<institution-code>:[<collection-code>:]]<specimen_id>", eg, "UAM:Mamm:52179". Please use NA if not applicable to your project.
library_prep_method - Free-form description of the methods used to create the sequencing library; a brief materials and methods section.
read1 - filename of read 1, exactly as it appears. (example: sample_S133_L003_R1_001.fastq.gz)
read2 - filename of read 2, exactly as it appears. (example: sample_S133_L003_R2_001.fastq.gz)
run2_read1 - used if you have more than 2 files, input the filename of Run2_read1, exactly as it appears (example: MVZCCGP-Psa1_I-F04_S18_L001_R1_001.fastq.gz)
run2_read2 - used if you have more than 2 files, input the filename of Run2_read2, exactly as it appears (example: MVZCCGP-Psa1_I-F04_S18_L001_R2_001.fastq.gz)
You may added additional filename columns (e.g. run3_read1) if needed.
Questions? Contact Erik Enbody, eenbody@ucsc.edu