Input Files and Options

The Exomiser can be run via simply via a yaml analysis file. The extended cli capability was removed in version 10.0.0 as this was less capable than the yaml scripts and only supported hg19 exome analysis. Version 13.0.0 introduced a more flexible input using a GA4GH Phenopacket (v 1.0) for the sample data with the ability to specify the input pedigree, VCF and genome assembly independently; user-specified, preset or default analysis options and a new batch mode.

Sample, vcf, assembly, ped

It is recommended to provide Exomiser with the input sample as a Phenopacket. Exomiser will accept this in either JSON or YAML format. The sample is provided using the sample switch and the full path to the phenopacket file:

java -jar exomiser-cli-14.0.0.jar --sample path/to/phenopacket.json

Should the phenopacket either not specify a VCF file or specifies a file on another filesystem, the VCF file path can be provided/overridden using the vcf option. This option requires that the genome assembly the VCF file was called against is also specified using the assembly option:

java -jar exomiser-cli-14.0.0.jar --sample path/to/phenopacket.json --vcf path/to/genome.vcf --assembly hg19

or for hg38/ GRCh38:

java -jar exomiser-cli-14.0.0.jar --sample path/to/phenopacket.json --vcf path/to/genome.vcf --assembly hg38

Lastly, when analysing a multi-sample VCF a pedigree is required. This can be provided using a dedicated PED file. This uses the ped switch and a full path to the PED file:

java -jar exomiser-cli-14.0.0.jar --sample path/to/phenopacket.json --vcf path/to/genome.vcf --assembly hg38 --ped path/to/pedigree.ped

or the pedigree, proband and family members can be provided as a phenopacket family message which can encode the pedigree.

java -jar exomiser-cli-14.0.0.jar --sample path/to/family.json --vcf path/to/genome.vcf --assembly hg38

Whatever the input used it is essential that the sample names used for the proband and other family members are consistent between the pedigree and the sample identifiers used in the VCF file. Exomiser will exit with an error explaining that they do not match. Examples of these can be found in the examples directory of the installation.

Preset

If no analysis is provided and no preset is specified, Exomiser will default to running the exome preset analysis. If you want to run Genomiser, which will analyse non-coding regions of a WGS sample use --preset genome:

java -jar exomiser-cli-14.0.0.jar --sample path/to/phenopacket.json --preset genome

In order to run a genome preset you need to first ensure that the REMM score data has been downloaded for the relevant genome assembly and is enabled in the application.properties see the Genomiser / REMM data section.

Analysis

Important

The exome and genome analyses found in the test-analysis-exome.yml and test-analysis-genome.yml files are recommended for use in most situations, and removing steps from the analysis is likely to negatively impact performance. It is strongly recommended to test any changes against the standard setup on the example samples and your own solved cases to check the impact of any changes you might want to make. If you want to score all variants and write failed ones to the output, it is recommended to use analysisMode: FULL.

Analysis files contain all possible options for running an analysis including the ability to specify variant frequency and pathogenicity data sources and the ability to tweak the order that analysis steps are performed.

See the test-analysis-exome.yml and test-analysis-genome.yml files located in the base install directory for examples. Details can be found in the Analysis section.

java -jar exomiser-cli-14.0.0.jar --analysis examples/test-analysis-exome.yml

These files an also be used to run full-genomes, however they will require substantially more RAM to do so. For example a 4.4 million variant analysis requires approximately 12GB RAM. However, RAM requirements can be greatly reduced by setting the analysisMode option to PASS_ONLY. This will also aid your ability to evaluate the results.

Analyses can be run in batch mode. Simply put the path to each analysis file in the batch file - one file path per line.

java -jar exomiser-cli-14.0.0.jar --analysis-batch examples/test-analysis-batch.txt

Output

By default Exomiser will write out any result files to the exomiser-cli-14.0.0/results sub-directory of the Exomiser installation directory. Unless specified in the output.yml or outputOptions section of the analysis YAML file, Exomiser will write out a .json and a .html file. These are for machine (JSON) and human (HTML) use. The filenames will match the input VCF filename. For example

java -jar exomiser-cli-14.0.0.jar --sample examples/pfeiffer-phenopacket.yml --vcf path/to/manuel.vcf.gz --assembly hg19

Would result in two files being output with the filename ‘manuel_exomiser’ and the ‘.json’ and ‘.html’ extensions:

exomiser-cli-14.0.0/results/manuel_exomiser.html
exomiser-cli-14.0.0/results/manuel_exomiser.json

Users requiring more control over their output can use either the outputOptions section of an analysis file or a specific Output options yaml file. An example of this can be found in the exomiser-cli-14.0.0/examples/output-options.yml file:

---
outputContributingVariantsOnly: false
# numGenes options: 0 = all or specify a limit e.g. 500 for the first 500 results
numGenes: 10
minExomiserGeneScore: 0.7
# outputDirectory: (optional) (default: '{exomiserDir}/results/')
outputDirectory: results/
# outputFileName: (optional) (default: 'input-vcf-name-exomiser')
    outputFileName: NA12345-exomiser-results
# out-format options: HTML, JSON, TSV_GENE, TSV_VARIANT, VCF (default: HTML)
outputFormats: [HTML, JSON, TSV_GENE]

This file is passed to Exomiser using the --output switch:

java -jar exomiser-cli-14.0.0.jar --sample examples/pfeiffer-phenopacket.yml --vcf path/to/manuel.vcf.gz --output path/to/output-options.yml

The output filename, directory and format can also be specified directly on the CLI (see the –help command for details).

Batch

The above commands can be added to a batch file for example in the file exomiser-cli-14.0.0/examples/test-analysis-batch-commands.txt

#This is an example analysis batch file to be run using the --analysis-batch command
#
#Each line should specify the path of a single analysis file, either relative to the directory the exomiser
#is being run from or the full system path. It will run any combination of exomiser commands listed using -h or --help.
#
# Original format exomiser analysis containing all the sample and analysis information
--analysis test-analysis-exome.yml
# New preset exome analysis using a v1 phenopacket to submit the phenotype information and adding/overriding the VCF input
--preset exome --sample pfeiffer-phenopacket.yml --vcf Pfeiffer.vcf.gz
# Using the default analysis (exome) with a v1 phenopacket containing the phenotype information and adding/overriding the VCF input
--sample pfeiffer-phenopacket.yml --vcf Pfeiffer.vcf.gz
# Using a user-defined analysis with a v1 phenopacket containing the phenotype information and adding/overriding the VCF input
--analysis preset-exome-analysis.yml --sample pfeiffer-phenopacket.yml --vcf Pfeiffer.vcf.gz
# Using a user-defined analysis with a v1 phenopacket containing the phenotype information and adding/overriding the VCF input
--analysis preset-exome-analysis.yml --sample pfeiffer-phenopacket.yml --vcf Pfeiffer.vcf.gz --output output-options.yml

then run using the --batch command:

java -jar exomiser-cli-14.0.0.jar --batch path/to/exomiser-cli-14.0.0/examples/test-analysis-batch-commands.txt

The advantage of this is that a single command will be able to analyse many samples in far less time than starting a new JVM for each as there will be no start-up penalty after the initial start and the Java JIT compiler will be able to take advantage of a longer-running process to optimise the runtime code. For maximum throughput on a cluster consider splitting your batch jobs over multiple nodes.