This page contains simple examples of how to run Scalpel under the different modes of operation. Please make sure to use the same reference file (genome.fa) for variant calling that was used to align the reads in the BAM file. For exome studies, the regions.bed typically contains the set of target regions (in BED format) of the exome capture protocol used. However the user can specify any regions of interest. See full command line options and instructions in the "MANUAL" available with the Scalpel distribution and online here.


Running Scalpel

Scalpel consists of two modules: discovery and export.
The discovery step process the input data using the mirco-aseembly procedure to dectect variants.
The export step gentoypes and filters the mutations according to the user defined paramteres. The minimal command line requirements are as follows:

single: variant calling on single sample.

scalpel-discovery --single --bam file.bam --bed regions.bed --ref genome.fa [OPTIONS]
scalpel-export --single --db DBfile --bed regions.bed --ref genome.fa [OPTIONS]

somatic: detects somatic indels in tumor/normal matched samples.

scalpel-discovery --somatic --normal normal.bam --tumor tumor.bam --bed regions.bed --ref genome.fa [OPTIONS]
scalpel-export --somatic --db DBfile --bed regions.bed --ref genome.fa [OPTIONS]

denovo: call de novo variants in one family of four individuals (mom, dad, aff, sib).

scalpel-discovery --denovo --dad dad.bam --mom mom.bam --aff aff.bam --sib sib.bam --bed regions.bed --ref genome.fa [OPTIONS]
scalpel-export --denovo --db DBfile --bed regions.bed --ref genome.fa [OPTIONS]

Only quads (family composed of 4 individuals: mom, dad, aff, sib) are currently supported. Until trios (or other pedigree types will be supported), the user can simply provide the same child twice for the analysis.


Two-pass option

For somatic and denovo variant calling in 30x-100x whole-genomes and whole-exomes it is recommended to use the --two-pass option which removes additional false-positive calls and generates a higher quality list of mutations. For example, in the case of a tumor/normal pair, a more sensitive analysis is performed on the normal sample to identify any signature of the candidate mutation in the tumor that was missed in the main analysis.

scalpel-discovery --somatic --normal normal.bam --tumor tumor.bam --bed regions.bed --ref genome.fa --two-pass [OPTIONS]

Similarly, for de novo mutations, parents are analyzed more carefully to identifiy any signature of the candidate mutation.
NOTE: for very high coverage data (e.g., 1000x or more) that can be obtained for example in panel studies, the --two-pass option may fail to analyze variants that have been discovered in the main analysis. So it is reccomanded to not use the --two-pass option for these type of data.

Variant calling on single region


Scalpel supports single region analysis by providing the region of interest in format "chr:start-end" to the --bed parameter as shown below:

scalpel-discovery --single --bam file.bam --ref genome.fa --bed chr:start-end [OPTIONS]

For regions of size < 1000bp, the bwa index of the FASTA file is used to quickly retrive the region of interest. This allows real time analysis of candiade locations without having to load in memory the whole reference. The same syntax applies to the other operation modes (somatic and denovo).

Exporting variants


By default Scalpel saves the list of detected indels in a file named variants.indel.* in the selected output directory according to the default filtering parameters. However it is recommended to explore different filtering criteria using the export tool:

scalpel-export --single --db database.db --bed regions.bed --ref genome.fa [OPTIONS]
scalpel-export --denovo --db database.db --bed regions.bed --ref genome.fa [OPTIONS]
scalpel-export --somatic --db database.db --bed regions.bed --ref genome.fa [OPTIONS]

The database.db file can be found in the output directory for the single operation mode or in the correspective subdirectories (main and twopass for denovo and soamtic modes). See full command line options in the "MANUAL" available with the Scalpel distribution.

Whole-Genome vs. Whole-Exome studies


Scalpel has been extensively tested on exome capture data but it can be used to detect mutations also in whole-genome data. In order to reduce its memory requirements it is recommended to run it on each chromosome separately. Given the more uniform coverage distribution of whole-genome data and the increasing read length of illumina technology, it is recommended to increase the window size (default 400) to 600bp or larger. For example, the following command can be used to call variants on chromosome 22 using 10 CPUs:

scalpel-discovery --single --bam file.bam --ref genome.fa --bed 22:1-51304566 --window 600 --numprocs 10 [OPTIONS]

The same syntax applies to the other operation modes (somatic and denovo).