Analysis/Job File Configuration

Exomiser analyses are defined using a yml format configuration file(s). Examples can be found in the unpacked exomiser-cli-14.0.0/examples/ directory. Here you will find a mixture of samples, jobs, analyses, outputOptions, phenopackets, phenopacket family, vcf and ped files.

Job

An Exomiser job contains three sections:

These fields define all the input data, analysis steps and output requirements for a complete Exomiser analysis. They can be used as a record of how a sample was analysed. Alternatively the various sections can all be defined at runtime using the CLI arguments as explained in the Input Files and Options section.

which defines run mode, filters and prioritisers, and the output options section that defines the output format, output file and number of results that should be returned. Each of these sections can be defined independently on the command line or provided as a single file as shown in the job section.

## Exomiser Job Template.
# The job is split into three sections:
#    sample: describes the proband
#    analysis: details the steps exomiser will take to analyse the sample
#    outputOptions: specifies what files to output and where with filtering options for the number of genes and the exomiser score
#
# These can be input separately on the command line e.g. --sample sample.yml --analysis analysis.yml
# In this example the 'sample' is replaced with a 'phenopacket' see https://phenopacket-schema.readthedocs.io/en/1.0.0/phenopacket.html
---
phenopacket:
  id: manuel
  subject:
    id: manuel
    sex: MALE
  phenotypicFeatures:
    - type:
        id: HP:0001159
        label: Syndactyly
    - type:
        id: HP:0000486
        label: Strabismus
    - type:
        id: HP:0000327
        label: Hypoplasia of the maxilla
    - type:
        id: HP:0000520
        label: Proptosis
    - type:
        id: HP:0000316
        label: Hypertelorism
    - type:
        id: HP:0000244
        label: Brachyturricephaly
  htsFiles:
    - uri: examples/Pfeiffer.vcf
      htsFormat: VCF
      genomeAssembly: hg19
  metaData:
    created: '2019-11-12T13:47:51.948Z'
    createdBy: julesj
    resources:
      - id: hp
        name: human phenotype ontology
        url: http://purl.obolibrary.org/obo/hp.owl
        version: hp/releases/2019-11-08
        namespacePrefix: HP
        iriPrefix: 'http://purl.obolibrary.org/obo/HP_'
    phenopacketSchemaVersion: 1.0
analysis:
  #FULL or PASS_ONLY
  analysisMode: PASS_ONLY
  # In cases where you do not want any cut-offs applied an empty map should be used e.g. inheritanceModes: {}
  # These are the default settings, with values representing the maximum minor allele frequency in percent (%) permitted for an
  # allele to be considered as a causative candidate under that mode of inheritance.
  # If you just want to analyse a sample under a single inheritance mode, delete/comment-out the others. For AUTOSOMAL_RECESSIVE
  # or X_RECESSIVE ensure *both* relevant HOM_ALT and COMP_HET modes are present.
  inheritanceModes: {
    AUTOSOMAL_DOMINANT: 0.1,
    AUTOSOMAL_RECESSIVE_COMP_HET: 2.0,
    AUTOSOMAL_RECESSIVE_HOM_ALT: 0.1,
    X_DOMINANT: 0.1,
    X_RECESSIVE_COMP_HET: 2.0,
    X_RECESSIVE_HOM_ALT: 0.1,
    MITOCHONDRIAL: 0.2
  }
  # Possible frequencySources:
  # UK10K - http://www.uk10k.org/ (UK10K)
  # gnomAD - http://gnomad.broadinstitute.org/ (GNOMAD_E, GNOMAD_G)
  # note that as of gnomAD v2.1 1000 genomes, ExAC are part of gnomAD
  # as of gnomAD v4 TOPMed & ESP are also included in gnomAD
  frequencySources: [
    UK10K,

    GNOMAD_E_AFR,
    GNOMAD_E_AMR,
    #        GNOMAD_E_ASJ,
    GNOMAD_E_EAS,
    #    GNOMAD_E_FIN,
    GNOMAD_E_NFE,
    #    GNOMAD_E_OTH,
    GNOMAD_E_SAS,

    GNOMAD_G_AFR,
    GNOMAD_G_AMR,
    #        GNOMAD_G_ASJ,
    GNOMAD_G_EAS,
    #    GNOMAD_G_FIN,
    GNOMAD_G_NFE,
    #    GNOMAD_G_OTH,
    GNOMAD_G_SAS
  ]
  # Possible pathogenicitySources: (POLYPHEN, MUTATION_TASTER, SIFT), (REVEL, MVP), CADD, REMM
  # REMM is trained on non-coding regulatory regions
  # *WARNING* if you enable CADD or REMM ensure that you have downloaded and installed the CADD/REMM tabix files
  # and updated their location in the application.properties. Exomiser will not run without this.
  pathogenicitySources: [ REVEL, MVP ]
  #this is the standard exomiser order.
  #all steps are optional
  steps: [
    #hiPhivePrioritiser: {},
    #priorityScoreFilter: {priorityType: HIPHIVE_PRIORITY, minPriorityScore: 0.500},
    #intervalFilter: {interval: 'chr10:123256200-123256300'},
    # or for multiple intervals:
    #intervalFilter: {intervals: ['chr10:123256200-123256300', 'chr10:123256290-123256350']},
    # or using a BED file - NOTE this should be 0-based, Exomiser otherwise uses 1-based coordinates in line with VCF
    #intervalFilter: {bed: /full/path/to/bed_file.bed},
    #genePanelFilter: {geneSymbols: ['FGFR1','FGFR2']},
    # geneBlacklistFilter: { },
      failedVariantFilter: { },
    #qualityFilter: {minQuality: 50.0},
      variantEffectFilter: {
        remove: [
            FIVE_PRIME_UTR_EXON_VARIANT,
            FIVE_PRIME_UTR_INTRON_VARIANT,
            THREE_PRIME_UTR_EXON_VARIANT,
            THREE_PRIME_UTR_INTRON_VARIANT,
            NON_CODING_TRANSCRIPT_EXON_VARIANT,
            UPSTREAM_GENE_VARIANT,
            INTERGENIC_VARIANT,
        REGULATORY_REGION_VARIANT,
        CODING_TRANSCRIPT_INTRON_VARIANT,
        NON_CODING_TRANSCRIPT_INTRON_VARIANT,
        DOWNSTREAM_GENE_VARIANT
      ]
    },
    # removes variants represented in the database
    #knownVariantFilter: {},
    frequencyFilter: {maxFrequency: 2.0},
    pathogenicityFilter: {keepNonPathogenic: true},
    # inheritanceFilter and omimPrioritiser should always run AFTER all other filters have completed
    # they will analyse genes according to the specified modeOfInheritance above- UNDEFINED will not be analysed.
    inheritanceFilter: {},
    # omimPrioritiser isn't mandatory.
    omimPrioritiser: {},
    #priorityScoreFilter: {minPriorityScore: 0.4},
    # Other prioritisers: Only combine omimPrioritiser with one of these.
    # Don't include any if you only want to filter the variants.
    hiPhivePrioritiser: {},
    # or run hiPhive in benchmarking mode:
    #hiPhivePrioritiser: {runParams: 'mouse'},
    #phivePrioritiser: {}
    #phenixPrioritiser: {}
    #exomeWalkerPrioritiser: {seedGeneIds: [11111, 22222, 33333]}
  ]
outputOptions:
  outputContributingVariantsOnly: false
  #numGenes options: 0 = all or specify a limit e.g. 500 for the first 500 results
  numGenes: 0
  # Path to the desired output directory. Will default to the 'results' subdirectory of the exomiser install directory
  #outputDirectory: results
  # Filename for the output files. Will default to {input-vcf-filename}-exomiser
  outputFileName: Pfeiffer-hiphive-exome
  #out-format options: HTML, JSON, TSV_GENE, TSV_VARIANT, VCF (default: HTML)
  outputFormats: [HTML, JSON, TSV_GENE, TSV_VARIANT, VCF]

Sample

It is recommended to provide Exomiser the input sample as in Phenopacket format. Exomiser will accept this in either JSON or YAML format.

proband:

Identifier used for the proband in the VCF file.

hpoIds:

Input of the HPO identifiers/terms. You can select them using the HPO browser. Input must be in array format. Terms are comma separated and delimited by single quotes. For example [‘HP:0001156’, ‘HP:0001363’, ‘HP:0011304’, ‘HP:0010055’]. It is critical that these are as detailed as possible and describe the observed phenotypes as fully and precisely as possible. These are used by the phenotype matching algorithms and will heavily influence the outcome of an analysis.

vcf:

The variant file in VCF format. There can be variants of multiple samples from one family in the file.

ped:

If you have multiple samples as input you have to define the pedigree using the ped format. It is important that you correctly define affected and unaffected individuals. If you use X_RECESSIVE as mode of inheritance be sure that the sex is correct (unknown is also fine).

Preset

# one of EXOME or GENOME. GENOME will require REMM to be available. Default is EXOME.
preset: EXOME

Analysis

The analysis shown below is equivalent to the --preset exome argument or adding

preset: EXOME

to an Exomiser job file. In most cases it is recommended to use the default presets as these have been thoroughly tested on the UK 100,000 genome rare-disease cohort. In case the user requires anything different, it is possible to manually define the data sources and steps in an Exomiser analysis.

---
analysisMode: PASS_ONLY
inheritanceModes: {
  AUTOSOMAL_DOMINANT: 0.1,
  AUTOSOMAL_RECESSIVE_HOM_ALT: 0.1,
  AUTOSOMAL_RECESSIVE_COMP_HET: 2.0,
  X_DOMINANT: 0.1,
  X_RECESSIVE_HOM_ALT: 0.1,
  X_RECESSIVE_COMP_HET: 2.0,
  MITOCHONDRIAL: 0.2
}
frequencySources: [
    THOUSAND_GENOMES,
    TOPMED,
    UK10K,

    ESP_AA, ESP_EA, ESP_ALL,

    GNOMAD_E_AFR,
    GNOMAD_E_AMR,
  #        GNOMAD_E_ASJ,
    GNOMAD_E_EAS,
    GNOMAD_E_FIN,
    GNOMAD_E_NFE,
    GNOMAD_E_OTH,
    GNOMAD_E_SAS,

    GNOMAD_G_AFR,
    GNOMAD_G_AMR,
  #        GNOMAD_G_ASJ,
    GNOMAD_G_EAS,
    GNOMAD_G_FIN,
    GNOMAD_G_NFE,
    GNOMAD_G_OTH,
    GNOMAD_G_SAS
]
# Possible pathogenicitySources: (POLYPHEN, MUTATION_TASTER, SIFT), (REVEL, MVP), CADD, REMM
# REMM is trained on non-coding regulatory regions
# *WARNING* if you enable CADD or REMM ensure that you have downloaded and installed the CADD/REMM tabix files
# and updated their location in the application.properties. Exomiser will not run without this.
pathogenicitySources: [ REVEL, MVP ]
#this is the standard exomiser order.
steps: [
    failedVariantFilter: { },
    variantEffectFilter: {
      remove: [
          FIVE_PRIME_UTR_EXON_VARIANT,
          FIVE_PRIME_UTR_INTRON_VARIANT,
          THREE_PRIME_UTR_EXON_VARIANT,
          THREE_PRIME_UTR_INTRON_VARIANT,
          NON_CODING_TRANSCRIPT_EXON_VARIANT,
          NON_CODING_TRANSCRIPT_INTRON_VARIANT,
          CODING_TRANSCRIPT_INTRON_VARIANT,
          UPSTREAM_GENE_VARIANT,
          DOWNSTREAM_GENE_VARIANT,
          INTERGENIC_VARIANT,
          REGULATORY_REGION_VARIANT
      ]
    },
    frequencyFilter: { maxFrequency: 2.0 },
    pathogenicityFilter: { keepNonPathogenic: true },
    inheritanceFilter: { },
    omimPrioritiser: { },
    hiPhivePrioritiser: { }
]

analysisMode:

Can be FULL or PASS_ONLY. We highly recommend PASS_ONLY on genomes for because of memory issues. It will only keep variants that passes all filters. FULL will keep all variants, using the most memory and requiring the most time, but it can be useful for troubleshooting why a particular variant of interest might not have been returned in the results of an analysis.

analysisMode: PASS_ONLY

inheritanceModes:

Can be AUTOSOMAL_DOMINANT, AUTOSOMAL_RECESSIVE, X_RECESSIVE or UNDEFINED. This is a functionality of Jannovar. See its inheritance documentation for further information.

# These are the default settings, with values representing the maximum minor allele frequency in percent (%) permitted for an
# allele to be considered as a causative candidate under that mode of inheritance.
# If you just want to analyse a sample under a single inheritance mode, delete/comment-out the others. For AUTOSOMAL_RECESSIVE
# or X_RECESSIVE ensure *both* relevant HOM_ALT and COMP_HET modes are present.
# In cases where you do not want any cut-offs applied an empty map should be used e.g. inheritanceModes: {}
inheritanceModes: {
        AUTOSOMAL_DOMINANT: 0.1,
        AUTOSOMAL_RECESSIVE_HOM_ALT: 0.1,
        AUTOSOMAL_RECESSIVE_COMP_HET: 2.0,
        X_DOMINANT: 0.1,
        X_RECESSIVE_HOM_ALT: 0.1,
        X_RECESSIVE_COMP_HET: 2.0,
        MITOCHONDRIAL: 0.2
}

frequencySources:

Here you can specify which variant frequency databases you want to use. You can add multiple databases using the same array format as the HPO IDs.

The data sources used are from 1000 genomes (via DBSNP), DBSNP, ESP, UK10K (via DBSNP), TOPMed (via DBSNP).

As of the 2402 data release ExAC, gnomAD exomes and gnomAD genomes source has been removed as this is part of the gnomAD 2.1+ data.

DBSNP:

THOUSAND_GENOMES, UK10K, TOPMED

ESP:

ESP_AA, ESP_EA, ESP_ALL

gnomAD exomes:

GNOMAD_E_AFR, GNOMAD_E_AMR, GNOMAD_E_ASJ, GNOMAD_E_EAS, GNOMAD_E_FIN, GNOMAD_E_NFE, GNOMAD_E_MID, GNOMAD_E_OTH, GNOMAD_E_SAS,

gnomAD genomes:

GNOMAD_G_AFR, GNOMAD_G_AMR, GNOMAD_G_AMI, GNOMAD_G_ASJ, GNOMAD_G_EAS, GNOMAD_G_FIN, GNOMAD_G_NFE, GNOMAD_G_MID, GNOMAD_G_OTH, GNOMAD_G_SAS

We recommend using all databases if the proband population background is unknown, although removing the ASJ, AMI, FIN, MID and OTH populations is recommended as these are small/founder populations which are likely to have artificially high allele frequencies for some relevant variants. These populations will not be included when assessing the population frequency for the ACMG assignments, even if used in the filtering.

frequencySources: [
  THOUSAND_GENOMES,
  TOPMED,
  UK10K,

  ESP_AA, ESP_EA, ESP_ALL,

  GNOMAD_E_AFR,
  GNOMAD_E_AMR,
  # GNOMAD_E_ASJ,
  GNOMAD_E_EAS,
  # GNOMAD_E_FIN,
  GNOMAD_E_NFE,
  # GNOMAD_E_OTH,
  GNOMAD_E_SAS,

  GNOMAD_G_AFR,
  GNOMAD_G_AMR,
  # GNOMAD_G_ASJ,
  GNOMAD_G_EAS,
  # GNOMAD_G_FIN,
  GNOMAD_G_NFE,
  # GNOMAD_G_OTH,
  GNOMAD_G_SAS
]

pathogenicitySources:

Possible pathogenicitySources: POLYPHEN, MUTATION_TASTER, SIFT, REVEL, MVP, ALPHA_MISSENSE, SPLICE_AI (derived from gnomAD 4.0, so only available for hg38), CADD, REMM. REMM is trained on non-coding regulatory regions. WARNING if you enable CADD, ensure that you have downloaded and installed the CADD tabix files and updated their location in the application.properties (see CADD data). Exomiser will not run without this.

We recommend using either [REVEL, MVP] OR [POLYPHEN, MUTATION_TASTER, SIFT] as REVEL and MVP are newer predictors which have been shown to have better performance and are more nuanced. Mixing them with the Polyphen2, MutationTaster or SIFT will give worse performance. Testing on GEL solved cases with AlphaMissense slightly increased performance when combined with MVP. We advise testing on local cohorts for assessing local performance.

REVEL scores are freely available for non-commercial use. For other uses, please contact Weiva Sieh.

AlphaMissense Database Copyright (2023) DeepMind Technologies Limited. All predictions are provided for non-commercial research use only under CC BY-NC-SA license. Researchers interested in predictions not yet provided, and for non-commercial use, can send an expression of interest to alphamissense@google.com.

SpliceAI source code is provided under the GPLv3 license. SpliceAI includes several third party packages provided under other open source licenses, please see NOTICE for additional details. The trained models used by SpliceAI (located in this package at spliceai/models) are provided under the CC BY NC 4.0 license for academic and non-commercial use; other use requires a commercial license from Illumina, Inc.

pathogenicitySources: [REVEL, MVP, REMM]

steps

This section instructs exomiser which analysis steps should be run and with which criteria. _n.b._ the order in which the steps are declared is important - exomiser will run them in the order declared, although certain optimisations will happen automatically. We recommend using the [standard settings](../example/test-analysis-genome) for genome wide analysis as these have been optimised for both speed and memory performance. Nonetheless all steps are optional. Being an array of steps, this section must be enclosed in square brackets. Steps are comma separated and written in hash format name: {options}. All steps are optional - comment them out or delete them if you do not want to use them.

Analysis steps are defined in terms of variant filters, gene filters or prioritisers. The inheritanceFilter and omimPrioritiser are both somewhat anomalous as they operate on genes but also require the variants to have already been filtered. The optimiser will ensure that these are run at the correct time if they have been incorrectly placed.

Using these it is possible, for example to create artificial exomes, define gene panels or only examine specific regions.

Variant filters

These operate on variants and will produce a pass or fail result for each variant run through the filter.

failedVariantFilter:

Removes variants which are not marked as PASS or . in the VCF file. This is a highly recommended filter which will remove a lot of noise.

failedVariantFilter: {}

qualityFilter:

Removes variants with VCF QUAL scores lower than the given minQuality.

qualityFilter: {minQuality: 50.0}

intervalFilter:

Defines an interval of interest. Only variants within this interval will be passed. Currently only single intervals are possible.

intervalFilter: {interval: 'chr10:123256200-123256300'}

geneIdFilter:

You can define entrez-gene-ids for genes of interest. Only variants associated with these genes will be analyzed.

geneIdFilter: {geneIds: [12345, 34567, 98765]}

variantEffectFilter:

If you are interested only in specific functional classes of variants you can define a set of classes you want to remove from the output. Variant effects are generated by Jannovar. Jannovar uses Sequence Ontology (SO) terms and are listed in their manual.

variantEffectFilter: {remove: [SYNONYMOUS_VARIANT]}

regulatoryFeatureFilter:

If included it removes all non-regulatory, non-coding variants over 20Kb from a known gene. Intergenic and upstream variants in known enhancer regions over 20kb from a known gene are associated with genes in their TAD and not effected by this filter. This is an important filter to include when analysing whole-genome data.

regulatoryFeatureFilter: {}

knownVariantFilter:

Removes variants represented in the databases set in the frequencySources section. E.g. if you define

frequencySources: [THOUSAND_GENOMES]

every variant with an RS number will be removed. Variants without an RSid will be removed/failed if they are represented in any database defined in the frequencySources section. We do not recommend this option on recessive diseases.

knownVariantFilter: {}

frequencyFilter:

Frequency cutoff of a variant in percent. Frequencies are derived from the databases defined in the frequencySources section. We recommend a value below 5.0 % depending on the disease. Variants will be removed/failed if they have a frequency higher than the stated percentage in any database defined in the frequencySources section.

frequencyFilter: {maxFrequency: 1.0}

Important

Not defining this filter will result in all variants having no frequency data, even if the frequencySources are defined. Failing to include this will result in Exomiser assuming variants are all very rare and subsequently assigning an artificially inflated score, especially for very common variants. If you want to score all variants and write failed ones to the output, it is recommended to use analysisMode: FULL.

pathogenicityFilter:

Will apply the pathogenicity scores defined in the pathogenicitySources section to variants. If the keepNonPathogenic field is set to true then all variants will be kept. Setting this to false will set the filter to fail non-missense variants with pathogenicity scores lower than a score cutoff of 0.5. This filter is meant to be quite permissive and we recommend it be set to true.

pathogenicityFilter: {keepNonPathogenic: true}

Important

Not defining this filter will result in all variants having no pathogenicity data or ClinVar annotations, even if the pathogenicitySources are defined. Failing to include this will result in Exomiser using default scores based on the assigned variant effect. If you want to score all variants and write failed ones to the output, it is recommended to use analysisMode: FULL.

Gene filters

These act at the gene-level and therefore may also refer to the variants associated with the gene. As a rule this is discouraged, although is broken by the inheritanceFiler.

priorityScoreFilter:

Running the prioritizer followed by a priorityScoreFilter will remove genes which are least likely to contribute to the phenotype defined in hpoIds, this will dramatically reduce the time and memory required to analyze a genome. 0.501 is a good compromise to select good phenotype matches and the best protein-protein interactions hits using the hiPhive prioritiser. PriorityType can be one of HIPHIVE_PRIORITY, PHIVE_PRIORITY, PHENIX_PRIORITY, OMIM_PRIORITY, EXOMEWALKER_PRIORITY.

priorityScoreFilter: {priorityType: HIPHIVE_PRIORITY, minPriorityScore: 0.501}

inheritanceFilter:

inheritanceFilter and omimPrioritiser should always run AFTER all other filters have completed. They will analyze genes according to the specified modeOfInheritance above. If set to UNDEFINED no filtering will be done. You can read more in the Jannovar inheritance documentation how exactly this filter works.

inheritanceFilter: {}

Prioritisers

These work on the gene-level and will produce the semantic-similarity scores for how well a gene matches the sample’s HPO profile. We recommend using a combination of the OMIM and hiPHIVE prioritisers only. These have been tested on the UK’s 100,000 genomes pilot project on rare disease diagnoses. Using other prioritisers may miss diagnoses as the data could be out of date or the algorithm’s effectiveness may have been superseded. They are retained for historical interest.

omimPrioritiser:

inheritanceFilter and omimPrioritiser should always run AFTER all other filters have completed. Other prioritisers: Only combine omimPrioritiser with one of the next filters. The OMIM prioritiser adds known disease information from OMIM to genes including the inheritance mode and then checks this inheritance mode with the compatible inheritance modes of the gene. Genes with incompatible modes will have their scores halved.

omimPrioritiser: {}

hiPhivePrioritiser:

Scores genes using phenotype comparisons to human, mouse and fish involving disruption of the gene or nearby genes in the interactome using a random walk.

See the hiPHIVE publication for details.

# Using the default
hiPhivePrioritiser: {}

is the same as

hiPhivePrioritiser: {runParams: 'human, mouse, fish, ppi'}

It is possible to only run comparisons agains a given organism/set of organisms human,mouse,fish and include/exclude protein-protein interaction proximities ppi. e.g. only using human and mouse data -

hiPhivePrioritiser: {runParams: 'human,mouse'}

phenixPrioritiser:

Scores genes using phenotype comparisons to known human disease genes. See the PhenIX publication for details.

phenixPrioritiser: {}

phivePrioritiser:

Scores genes using phenotype comparisons to mice with disruption of the gene. This is equivalent to running hiPhivePrioritiser: {runParams: 'mouse'}. See the PHIVE publication for details.

phivePrioritiser: {}

exomeWalkerPrioritiser:

Scores genes by proximity in interactome to the seed genes. Gene identifiers are required to be genbank identifiers. See the ExomeWalker publication for details.

exomeWalkerPrioritiser: {seedGeneIds: [11111, 22222, 33333]}

Output options

When specified as part of a command-line argument, the file should be a properly formed YAML object:

# Exomiser outputOptions
---
outputContributingVariantsOnly: true
numGenes: 20
minExomiserGeneScore: 0.7
outputDirectory: /analyses/sample-12345/
outputFileName: sample-12345
outputFormats: [HTML, JSON]

When included as part of an exomiser job the fields are inlined in the exomiser-job.yaml file like so:

# other analysis and sample options above
outputOptions:
    outputContributingVariantsOnly: true
    numGenes: 20
    minExomiserGeneScore: 0.7
    outputDirectory: /analyses/sample-12345/
    outputFileName: sample-12345
    outputFormats: [HTML, JSON]

outputContributingVariantsOnly:

Can be true or false. Setting this to true will make the TSV_VARIANT and VCF output file only contain PASS variants which contribute to the gene score. Defaults to true.

numGenes:

Maximum number of genes listed in the results. If set to 0 all are listed. In most cases a limit of 50 is more than adequate as typically a good result will be found in the top 5. Not enabled by default.

minExomiserGeneScore

The mimimum gene (combined phenotype and variant) score required to be returned. As a rule of thumb scores >= 0.7 are a good score however, depending on the proband phenotype and the phenotype of best matching condition although it is not a hard-and-fast number. In our testing 0.7 gave the best performance in terms or recall and sensitivity. Not enabled by default.

outputPrefix:

Specify the path/filename without an extension and this will be added according to the outputFormats option. If unspecified this will default to the following: {exomiserDir}/results/input-vcf-name-exomiser-results.html. Alternatively, specify a fully qualified path only. e.g. /home/jules/exomes/analysis. DEPRECATED - replaced by explicit outputDirectory and outputFileName options. Users are strongly advised to migrate any existing scripts to use the new options (below), as this will be removed in a future version.

outputDirectory:

Optional path indicating where the output files for the analysis should be written. Will attempt to create any missing directories. Using this without the outputFileName option will result in a default filename being used which will be output to the specified directory. Default value: {runDirectory}/results

outputFileName:

Optional filename prefix to be used for the output files. Can be combined with the outputDirectory option to specify a custom location and filename. Used alone will result in files with the specified filename being written to the default results directory. Default value: {input-vcf-filename}-exomiser

outputFormats:

Array to define the output formats. can be [TSV-GENE], [TSV-VARIANT], [VCF], JSON or HTML or any combination like [TSV-GENE, TSV-VARIANT, VCF]. JSON is the most informative output option and suitable for use in downstream computational analyses or manually queried using something like jq. The HTML output is most suitable for manual inspection / human use. Default value: [JSON, HTML].