Service overview

The bioinformatics workflow brings together a number of open-source analysis tools and employs a Perl code-base to perform custom filtering, reporting and job process control.

The bioinformatics pipeline for single nucleotide variant (SNV) and indel identification utilises the ENSEMBL exome sequence for human and mouse to map the data.

The exome pipeline

DNA extraction

continue to step 2

Exome enrichment
Mouse: Agilent SureSelect XT2 All Exon Kit
Human: SureSelect XT2 All Exon V5 Kit

continue to step 3

Sequencing
Illumina HiSeq 2500 as Paired End (PE) 75 or 100bp reads

continue to step 4

Align reads to reference genome* (with BWA)

continue to step 5

Call raw SNVs vs reference genome (with SAMTools)

continue to step 6

Exclude known variation (dbSNP, common exome variants)

continue to step 7

Filter for coding or splicing variants** by aligning with Ensembl exome sequence

continue to step 8

Filter for non-synonymous variants (with ANNOVAR)

continue to step 9

Remove multi-SNV genes

continue to step 10

Single Nucleotide Variants

* reference mouse genome (mm10/GRCm38)

** splicing variants refer to changes that lay in potential splice donor–acceptor sites immediately adjacent to exon boundaries (out to 10 intronic bases) 

 

Sequencing analysis output

The sequencing output is six files for each sequenced exome.

readReport_summary Gives the number of reads obtained and the percentage of reads that mapped to the target Ensembl exome.
snpList_summary The actual filtered SNV list. Annotation with SNV allele frequency, CCDS gene id, Ensembl gene id, Online Mendelian Inheritance in Man (OMIM) loci, mouse mutant phenotype, expression patterns and PolyPhen2 scores for all novel SNVs
snpReport_summary Lists the filters used in the analysis, the number and percentage of SNVs that passed each filter, the final number of passed SNVs and any genes with more than one variant (SNVs in these genes could be a result of mis-mapping due to high homology with other genes).
exonReport_summary Provides a list of exons/regions that are covered by less than three reads.
indelList_summary The filtered indel list.
indelReport_summary Lists the filters used in the analysis, the number and percentage of indels that passed each filter, the final number of passed indels and any genes with more than one variant (indels in these genes could be a result of mis-mapping due to high homology with other genes).

For a more detailed description of the reports provided on completion of each project please download this Exome analysis reports (PDF, 196 KB).

 

Variant validation

We also offer a SNV validation service to customers. SNVs are validated using Amplifluor assays (Chemicon, Temecula, CA). Primers are designed using a semi-automated pipeline and results are provided in excel format. If an amplifluor assay does not work the researcher has the option of requesting that a Sanger sequencing assay be designed to the SNV.

 

Data Policy

Sequencing data Due to the large size and high volume of sequencing data, the APF has very limited capacity for data retention.
Intermediate analysis files The intermediate files will not be maintained at the APF. If users would like a copy of the intermediate analysis files, they must request this when submitting samples AND provide a portable hard drive for the file transfer. There will be a cost associated with the data transfer.
Analyzed Sequence files (fastQ & aligned data) This data will be stored at the APF for 3 months. For operational purposes, the APF may retain data beyond six months, but cannot guarantee retrievability after six months.
Reports  These will be provided to the researcher following final QC. The files will also be stored at the APF for 5 years.
QC data

 All Agilent Bioanalyser files will be stored for 5 years at the APF and can be provided upon request.

 

Menu weight: 
10

Updated:  13 December 2017/Responsible Officer:  Director/Page Contact:  Site manager