Installation
Software and Hardware requirements
Minimum 8/16GB RAM (For an exome analysis of a 30,000 variant sample 4GB RAM should suffice. For a genome analysis of a 4,400,000 variant sample 8GB RAM should suffice.)
Any 64-bit operating system
Java 17 or above
At least 100GB free disk space (SSD preferred for best performance)
An internet connection is not required to run the Exomiser, although network access will be required if accessing a networked database (optional).
By default the Exomiser is completely self-contained and is able to run on standard consumer laptops.
Pre-built Binaries
Note
This is the recommended way of installing for normal users.
Pre-built binaries are available from The Monarch Initiative or from the Exomiser repository on GitHub.
Exomiser requires 2-3 data files to be available as well - one for the phenotype data and one for each genome assembly required.
Windows install
Install 7-Zip for unzipping the archive files. The built-in archiving software has issues extracting the zip files.
Download the data and distribution files from https://data.monarchinitiative.org/exomiser/latest
Extract the distribution files by right-clicking exomiser-cli-14.0.0-distribution.zip and selecting 7-Zip > Extract Here
Extract the data files (e.g. 2402_phenotype.zip, 2402_hg19.zip) by right-clicking the archive and selecting 7-Zip > Extract files… into the exomiser data directory. By default exomiser expects this to be ‘exomiser-cli-14.0.0/data’, but this can be changed in the
application.properties
cd exomiser-cli-14.0.0
java -Xmx4g -jar exomiser-cli-14.0.0.jar –analysis examples/test-analysis-exome.yml
Linux install
The following shell script should work-
# download the distribution (won't take long) wget https://data.monarchinitiative.org/exomiser/latest/exomiser-cli-14.0.0-distribution.zip # download the data (this is ~80GB and will take a while). If you only require a single assembly, only download the relevant file. wget https://data.monarchinitiative.org/exomiser/latest/2402_hg19.zip wget https://data.monarchinitiative.org/exomiser/latest/2402_hg38.zip wget https://data.monarchinitiative.org/exomiser/latest/2402_phenotype.zip # unzip the distribution and data files - this will create a directory called 'exomiser-cli-14.0.0' in the current working directory unzip exomiser-cli-14.0.0-distribution.zip unzip 2402_*.zip -d exomiser-cli-14.0.0/data # Check the application.properties are pointing to the correct versions # exomiser.hg19.data-version=2402 # exomiser.hg38.data-version=2402 # exomiser.phenotype.data-version=2402 # run a test exome analysis cd exomiser-cli-14.0.0 java -Xmx4g -jar exomiser-cli-14.0.0.jar --analysis examples/test-analysis-exome.yml
This script will download, verify and extract the exomiser files and then run the analysis contained in the file ‘test-analysis-exome.yml’ from the examples sub-directory. This contains a known pathogenic missense variant in the FGFR2 gene.
Genomiser / REMM data
In order to run the Genomiser you will also need to download the REMM data file
from [here](https://zenodo.org/record/4768448). Once downloaded you’ll need to add the path to the ReMM.v0.3.1.tsv.gz
file to the application.properties
file. For example if you downloaded the file to the exomiser data directory you
could add the entry like this:
exomiser.hg19.remm-path=${exomiser.hg19.data-directory}/ReMM.v0.3.1.tsv.gz
If this step is omitted, the application will throw and error and stop any analysis which defines REMM
in the pathogenicitySources
section of an analysis yml file.
Having done this, run the analysis like this:
java -Xmx6g -jar exomiser-cli-14.0.0.jar --analysis examples/NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38.yml
This is an analysis for an autosomal recessive 5’UTR variant located in POMP gene on chromosome 13. The phenotype HPO terms are taken from the clinical synopsis of OMIM #601952 (http://www.omim.org/clinicalSynopsis/601952)
CADD data
In order to use CADD you will need to download the CADD data files separately. These can be accessed from https://cadd.gs.washington.edu/download. Exomiser only requires the file with the score in, not the full annotations. For example, in release v1.4 Exomiser requires both the files All possible SNVs of GRCh38/hg38 and 80M InDels to initiate a local setup. Each genome assembly will require the relevant files. The direct links from the US site are shown below and are correct at the time of writing.
wget https://krishna.gs.washington.edu/download/CADD/v1.4/GRCh38/whole_genome_SNVs.tsv.gz
wget https://krishna.gs.washington.edu/download/CADD/v1.4/GRCh38/whole_genome_SNVs.tsv.gz.tbi
wget https://krishna.gs.washington.edu/download/CADD/v1.4/GRCh38/InDels.tsv.gz
wget https://krishna.gs.washington.edu/download/CADD/v1.4/GRCh38/InDels.tsv.gz.tbi
wget https://krishna.gs.washington.edu/download/CADD/v1.4/GRCh37/whole_genome_SNVs.tsv.gz
wget https://krishna.gs.washington.edu/download/CADD/v1.4/GRCh37/whole_genome_SNVs.tsv.gz.tbi
wget https://krishna.gs.washington.edu/download/CADD/v1.4/GRCh37/InDels.tsv.gz
wget https://krishna.gs.washington.edu/download/CADD/v1.4/GRCh37/InDels.tsv.gz.tbi
Enable Exomiser to use CADD by altering the application.properties
file to enable these lines and ensure the
cadd.version
property matches the version you downloaded.
cadd.version=1.4
exomiser.hg19.cadd-snv-path=${exomiser.data-directory}/cadd/${cadd.version}/hg19/whole_genome_SNVs.tsv.gz
exomiser.hg19.cadd-in-del-path=${exomiser.data-directory}/cadd/${cadd.version}/hg19/InDels.tsv.gz
# and/or for hg38
exomiser.hg38.cadd-snv-path=${exomiser.data-directory}/cadd/${cadd.version}/whole_genome_SNVs.tsv.gz
exomiser.hg38.cadd-in-del-path=${exomiser.data-directory}/cadd/${cadd.version}/InDels.tsv.gz
Exomiser will expect the tabix index .tbi
file to be present in the same directory as the .tsv.gz
files. To use
CADD scores in an analysis, the pathogenicitySources
should contain the CADD
property
#Possible pathogenicitySources: POLYPHEN, MUTATION_TASTER, SIFT, CADD, REMM
#REMM is trained on non-coding regulatory regions
#*WARNING* if you enable CADD or REMM ensure that you have downloaded and installed the CADD/REMM tabix files
#and updated their location in the application.properties. Exomiser will not run without this.
pathogenicitySources: [POLYPHEN, MUTATION_TASTER, SIFT, CADD]
Configuring the application.properties
Once you have downloaded and unzipped all the data, you will need to edit the exomiser-cli-14.0.0/application.properties file located in the main exomiser-cli directory. This file contains a lot of comments for optional data and assemblies.
If you want to run Exomiser using data from a different release directory edit the line in application.properties
:
exomiser.data-directory=
with
exomiser.data-directory=/full/path/to/alternative/data/directory
For example, assuming you unzipped the contents of the 2402_hg38.zip data file into /data/exomiser-data:
exomiser.data-directory=/data/exomiser-data
where the contents of exomiser-data looks something like this:
$ tree -L 1 /data/exomiser-data/
/data/exomiser-data/
├── 2402_hg19
├── 2402_hg38
├── 2402_phenotype
├── cadd
└── remm
By default Exomiser will look for data located in the exomiser-cli-14.0.0/data directory.
After defining the a exomiser.data-directory, a minimal setup for exome analysis using GRCh37/hg19 would only require
the application.properties
to contain this:
### hg19 assembly ###
exomiser.hg19.data-version=2402
### phenotypes ###
exomiser.phenotype.data-version=2402
For a GRCh38/hg38 only setup:
### hg38 assembly ###
exomiser.hg38.data-version=2402
### phenotypes ###
exomiser.phenotype.data-version=2402
Or an install supporting both assemblies:
### hg19 assembly ###
exomiser.hg19.data-version=2402
### hg38 assembly ###
exomiser.hg38.data-version=2402
### phenotypes ###
exomiser.phenotype.data-version=2402
n.b. each assembly will require approximately 1GB RAM to load. Attempting to analyse a VCF called using an unsupported/unloaded assembly data will result in an unrecoverable error being thrown.
By default, Exomiser uses a whitelist created from ClinVar data. Exomiser will consider any variant on the whitelist to be maximally pathogenic, regardless of the underlying data (e.g. variant effect, allele frequency, predicted pathogenicity) and always included these in the results.
Troubleshooting
Zip file reported as too big or corrupted
If, when running ‘unzip exomiser-cli-14.0.0-distribution.zip’, you see the following:
error: Zip file too big (greater than 4294959102 bytes) Archive: exomiser-cli-14.0.0-distribution.zip warning [exomiser-cli-14.0.0-distribution.zip]: 9940454202 extra bytes at beginning or within zipfile (attempting to process anyway) error [exomiser-cli-14.0.0-distribution.zip]: start of central directory not found; zipfile corrupt. (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly)
Check that your unzip version was compiled with LARGE_FILE_SUPPORT and ZIP64_SUPPORT. This is standard with UnZip 6.00 and can be checked by typing:
unzip -version
This shouldn’t be an issue with more recent linux distributions.