This page contains simple examples of how to run Scalpel under the different modes of operation. Please make sure to use the same reference file (genome.fa) for variant calling that was used to align the reads in the BAM file. For exome studies, the regions.bed typically contains the set of target regions (in BED format) of the exome capture protocol used. However the user can specify any regions of interest. See full command line options and instructions in the "MANUAL" available with the Scalpel distribution and online here.
Running Scalpel
Scalpel consists of two modules: discovery and export.
The discovery step process the input data using the mirco-aseembly procedure to dectect variants.
The export step gentoypes and filters the mutations according to the user defined paramteres.
The minimal command line requirements are as follows:
single: variant calling on single sample.
scalpel-export --single --db DBfile --bed regions.bed --ref genome.fa [OPTIONS]
somatic: detects somatic indels in tumor/normal matched samples.
scalpel-export --somatic --db DBfile --bed regions.bed --ref genome.fa [OPTIONS]
denovo: call de novo variants in one family of four individuals (mom, dad, aff, sib).
scalpel-export --denovo --db DBfile --bed regions.bed --ref genome.fa [OPTIONS]
Only quads (family composed of 4 individuals: mom, dad, aff, sib) are currently supported. Until trios (or other pedigree types will be supported), the user can simply provide the same child twice for the analysis.
Two-pass option
For somatic and denovo variant calling in 30x-100x whole-genomes and whole-exomes it is recommended to use the --two-pass option which removes additional false-positive calls and generates a higher quality list of mutations. For example, in the case of a tumor/normal pair, a more sensitive analysis is performed on the normal sample to identify any signature of the candidate mutation in the tumor that was missed in the main analysis.
NOTE: for very high coverage data (e.g., 1000x or more) that can be obtained for example in panel studies, the --two-pass option may fail to analyze variants that have been discovered in the main analysis. So it is reccomanded to not use the --two-pass option for these type of data.
Variant calling on single region
Scalpel supports single region analysis by providing the region of interest in format "chr:start-end" to the --bed parameter as shown below:
Exporting variants
By default Scalpel saves the list of detected indels in a file named variants.indel.* in the selected output directory according to the default filtering parameters. However it is recommended to explore different filtering criteria using the export tool:
scalpel-export --denovo --db database.db --bed regions.bed --ref genome.fa [OPTIONS]
scalpel-export --somatic --db database.db --bed regions.bed --ref genome.fa [OPTIONS]
Whole-Genome vs. Whole-Exome studies
Scalpel has been extensively tested on exome capture data but it can be used to detect mutations also in whole-genome data. In order to reduce its memory requirements it is recommended to run it on each chromosome separately. Given the more uniform coverage distribution of whole-genome data and the increasing read length of illumina technology, it is recommended to increase the window size (default 400) to 600bp or larger. For example, the following command can be used to call variants on chromosome 22 using 10 CPUs: