.. _inputandoptions: ======================= Input Files and Options ======================= The Exomiser can be run via simply via a yaml analysis file. The extended cli capability was removed in version 10.0.0 as this was less capable than the yaml scripts and only supported hg19 exome analysis. Version 13.0.0 introduced a more flexible input using a `GA4GH Phenopacket `_ (v 1.0) for the sample data with the ability to specify the input pedigree, VCF and genome assembly independently; user-specified, preset or default analysis options and a new batch mode. .. _commands: ======== Commands ======== Exomiser has an `analyse` command for single analyses and a `batch` command for running batches of analyses. Unless you are running hundreds of samples, the `analyse` command is what you need. The `analyse` command parameters and options are also used in the input file for the `batch` command. analyse ------- The `analyse` command is the primary command for performing an analysis with the Exomiser. It has various means of providing input data and specifying the analysis requirements and output files and formats. These are detailed below. We recommend running with the default settings as generally, these will give you the best performance. See `Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases `_ for an illustration. sample, vcf, assembly, ped ========================== It is recommended to provide Exomiser with the input sample as a `Phenopacket `_. Exomiser will accept this in either JSON or YAML format. The sample is provided using the ``sample`` switch and the full path to the phenopacket file: .. parsed-literal:: java -jar exomiser-cli-|version|.jar analyse --sample examples/pfeiffer-phenopacket.yml Should the phenopacket either not specify a VCF file or specifies a file on another filesystem, the VCF file path can be provided/overridden using the ``vcf`` option. This option requires that the genome assembly the VCF file was called against is also specified using the ``assembly`` option: .. parsed-literal:: java -jar exomiser-cli-|version|.jar analyse --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz --assembly hg19 or for hg38/ GRCh38: .. parsed-literal:: java -jar exomiser-cli-|version|.jar analyse --sample path/to/phenopacket.json --vcf path/to/genome.vcf --assembly hg38 Lastly, when analysing a multi-sample VCF a pedigree is required. This can be provided using a dedicated PED file. This uses the ``ped`` switch and a full path to the PED file: .. parsed-literal:: java -jar exomiser-cli-|version|.jar analyse --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer-quartet.vcf.gz --ped examples/Pfeiffer-quartet.ped or the pedigree, proband and family members can be provided as a phenopacket `family `_ message which can encode the pedigree. .. parsed-literal:: java -jar exomiser-cli-|version|.jar analyse --sample examples/pfeiffer-family.yml Whatever the input used it is essential that the sample names used for the proband and other family members are consistent between the pedigree and the sample identifiers used in the VCF file. Exomiser will exit with an error explaining that they do not match. Examples of these can be found in the `examples` directory of the installation. preset ====== If no ``analysis`` is provided and no preset is specified, Exomiser will default to running the ``exome`` preset analysis. If you want to run Genomiser, which will analyse non-coding regions of a WGS sample use ``--preset genome``: .. parsed-literal:: java -jar exomiser-cli-|version|.jar analyse --sample path/to/phenopacket.json --preset genome In order to run a ``genome`` preset you need to first ensure that the REMM score data has been downloaded for the relevant genome assembly and is enabled in the ``application.properties`` see the :ref:`remm` section. analysis ======== .. important:: The exome and genome analyses found in the `examples/preset-exome-analysis.yml` and `examples/preset-genome-analysis.yml` files are recommended for use in most situations, and removing steps from the analysis is likely to negatively impact performance. It is *strongly* recommended to test any changes against the standard setup on the example samples and your own solved cases to check the impact of any changes you might want to make. If you want to score all variants and write failed ones to the output, it is recommended to use `analysisMode: FULL`. Analysis files contain all possible options for running an analysis including the ability to specify variant frequency and pathogenicity data sources and the ability to tweak the order that analysis steps are performed. See the test-analysis-exome.yml and test-analysis-genome.yml files located in the base install directory for examples. Details can be found in the :ref:`analysis` section. .. parsed-literal:: java -jar exomiser-cli-|version|.jar analyse --analysis examples/test-analysis-exome.yml These files an also be used to run full-genomes, however they will require substantially more RAM to do so. For example a 4.4 million variant analysis requires approximately 12GB RAM. However, RAM requirements can be reduced to 4GB by setting the analysisMode option to PASS_ONLY. This will also aid your ability to evaluate the results. output ====== By default Exomiser will write out any result files to the exomiser-cli-|version|/results sub-directory of the Exomiser installation directory. Unless specified in the `output.yml` or `outputOptions` section of the analysis YAML file, Exomiser will write out a `.jsonl`, a `.parquet` and a `.html` file. These are for machine (JSON) and human (HTML) use. The filenames will match the input VCF filename. For example .. parsed-literal:: java -jar exomiser-cli-|version|.jar analyse --sample examples/pfeiffer-phenopacket.yml --vcf path/to/manuel.vcf.gz --assembly hg19 Would result in three files being output with the filename 'manuel_exomiser' and the '.jsonl', `.parquet` and '.html' extensions: .. parsed-literal:: exomiser-cli-|version|/results/manuel_exomiser.html exomiser-cli-|version|/results/manuel_exomiser.json exomiser-cli-|version|/results/manuel_exomiser.parquet Users requiring more control over their output can use either the ``outputOptions`` section of an analysis file or a specific :ref:`outputoptions` yaml file. An example of this can be found in the exomiser-cli-|version|/examples/output-options.yml file: .. code-block:: yaml --- outputContributingVariantsOnly: false # numGenes options: 0 = all or specify a limit e.g. 500 for the first 500 results numGenes: 10 minExomiserGeneScore: 0.7 # outputDirectory: (optional) (default: '{exomiserDir}/results/') outputDirectory: results/ # outputFileName: (optional) (default: 'input-vcf-name-exomiser') outputFileName: NA12345-exomiser-results # out-format options: HTML, JSON, PARQUET, TSV_GENE, TSV_VARIANT, VCF (default: HTML) outputFormats: [HTML, PARQUET] This file is passed to Exomiser using the ``--output`` switch: .. parsed-literal:: java -jar exomiser-cli-|version|.jar analyse --sample examples/pfeiffer-phenopacket.yml --vcf path/to/manuel.vcf.gz --output path/to/output-options.yml The output filename, directory and format can also be specified directly on the CLI (see the --help command for details). batch ----- The `analyse` commands can be added to a batch file for example in the file exomiser-cli-|version|/examples/test-analysis-batch-commands.txt .. parsed-literal:: #This is an example analysis batch file to be run using the --analysis-batch command # #Each line should specify the path of a single analysis file, either relative to the directory the exomiser #is being run from or the full system path. It will run any combination of exomiser commands listed using -h or --help. # # Original format exomiser analysis containing all the sample and analysis information --analysis test-analysis-exome.yml # New preset exome analysis using a v1 phenopacket to submit the phenotype information and adding/overriding the VCF input --preset exome --sample pfeiffer-phenopacket.yml --vcf Pfeiffer.vcf.gz # Using the default analysis (exome) with a v1 phenopacket containing the phenotype information and adding/overriding the VCF input --sample pfeiffer-phenopacket.yml --vcf Pfeiffer.vcf.gz # Using a user-defined analysis with a v1 phenopacket containing the phenotype information and adding/overriding the VCF input --analysis preset-exome-analysis.yml --sample pfeiffer-phenopacket.yml --vcf Pfeiffer.vcf.gz # Using a user-defined analysis with a v1 phenopacket containing the phenotype information and adding/overriding the VCF input --analysis preset-exome-analysis.yml --sample pfeiffer-phenopacket.yml --vcf Pfeiffer.vcf.gz --output output-options.yml then run using the ``batch`` command: .. parsed-literal:: java -jar exomiser-cli-|version|.jar batch path/to/exomiser-cli-|version|/examples/test-analysis-batch-commands.txt The advantage of this is that a single command will be able to analyse many samples in far less time than starting a new JVM for each as there will be no start-up penalty after the initial start and the Java JIT compiler will be able to take advantage of a longer-running process to optimise the runtime code. For maximum throughput on a cluster consider splitting your batch jobs over multiple nodes.