Chemical Terms Evaluator

Version 5.2.4

Contents

 

Introduction

The Chemical Terms Evaluator is designed to evaluate mathematical expressions on molecules. These expressions usually have a chemical meaning formulated in ChemAxon's Chemical Terms Language using built-in chemical and general purpose functions. It is also possible to extend this built-in set of calculations by a user-defined configuration.

Apart from evaluating Chemical Terms by the evaluate chemaxon command line tool, this evaluation mechanism is used for chemical calculations in chemaxon products where computational and/or search conditions come into the picture, such as pharmacophore feature identification (PMapper) (note, that pmapper feature definitions use a specific syntax), reaction definitions (Reactor), database filters and chemical calculations (JChem Cartridge).

The heart of the evaluator mechanism is the JEP Java Expression Parser.

You may want to look at the complete language reference including a description of the expression syntax and some simple examples showing how some well-known chemical rules can be formulated in this language.

Evaluator uses molecule context to set the input molecule, therefore calculations refer to the input molecule by default. The language reference also includes a set of Evaluator examples. A set of working examples is available.

 

Installation

Usage

The command line tool evaluate evaluates a single expression and prints the result in human readable text format or else outputs the input molecule with the result set as a specified SDF tag.
evaluate [options] [input files/strings]

Options

Options:
  -h, --help                          this help message
  -l, --list-functions                list Chemical Terms functions

Input Options:
  -c, --config <filepath>             configuration XML file
                                      (if omitted then default
                                      configuration is applied)
  -n, --no-input-mol                  expression should be evaluated
                                      without input molecule
  -e, --expr-string <str|filepath>    expression string or file

Output Options:
  -o, --output <filepath>             output file path (default: stdout)
  -g, --ignore-error                  continue with next molecule on error
  -v, --verbose                       verbose output
  -C, --clean <dim[:opts]>            clean output molecules (dim: 2 or 3)
                                      with options
                                      (default: t2000 - time limit: 2 sec)
           (see http://www.chemaxon.com/marvin/help/sci/cleanoptions.html)
  -f, --format <format>               output format if result is molecule
                                      (default: smiles or smarts)
                                      (ignores the output options below)
  -x, --extract <format>              extract mode: write exactly those
                                      molecules in the specified format that
                                      satisfy the input boolean expression
                                      (excludes other output options)
  -p, --precision <precision>         max. number of fractional digits
                                      in the output (default: 2)
  -S, --sdf-output                    SDF output (otherwise text output)
  -t, --tag                           name of the SDFile tag to store the
                                      evaluation result (default: CALC)
  -i, --include-expr                  output expression string

The input molecule file can contain more than one molecule, in this case the expression evaluation is performed for all input molecules one-by-one.

The command line parameter --config specifies the filename of the configuration file. If this parameter is not specified, then the default configuration is used.

If the command line parameter --no-input-mol is specified then the expression is evaluated without input molecule.

The command line parameter --expr-string specifies the expression string if it is given on the command line or the file path containing the expression string.

The command line parameter --format specifies the output molecule format in case when the output is a molecule or a molecule array. The default format is SMILES / SMARTS. If this option is used then all other output options except for --output, --ignore-error and --verbose are ignored.

If the command line parameter --clean is specified then result molecules as well as SDF output is cleaned in the given dimension.

If the command line parameter --extract is specified then the input expression is used as a molecule filter: for each input molecule it is evaluated as a boolean condition and the program filters the molecules that satisfy this condition, that is, for which the expression evaluation result is true. These molecules are written as output in the specified format. If this option is used then all other output options except for --output, --ignore-error and --verbose are ignored.

The command line parameter --precision specifies the maximum number of fractional digits to be displayed in the output.

If the command line parameter --sdf-output is specified then input molecules are written to the output in SDF format with evaluation result set as an SDF tag. The command line parameter --tag specifies this SDF tag.

If the command line parameter --include-expr is specified then the evaluation result is preceeded by the expression string itself in the output.

If the command line parameter --ignore-error is specified, then import/export errors will not stop the processing but the error is written to the console and the molecule is skipped. By default, the program exits in case of molecule import/export errors.

Input

The software may take molecules from a text file. Most molecular file formats are accepted (MDL molfile, Compressed molfile, SDfile, Compressed SDfile, SMILES, etc.).

If no input file name is given in the command line the standard input is read.

Output

If no output file name is given, results are written to the standard output.

If the --sdf-output command line parameter is specified, the output format is SDF and the evaluation result is written to an SDF tag (default tag: CALC). Otherwise only the evaluation result is written to the output in simple text format.

 

Configuration

The configuration file is an XML file containing some/all of the following optional subsections:

  1. Evaluator parameters: this section specifies general evaluator parameters, currently cache-mode can be set here
  2. Plugin definitions: this section describes the plugins and their parameters that can be referenced from the expression (these override the default plugin definitions)
  3. Function definitions: this section describes the predefined and user-defined functions that can be referenced from the expression (these override the default function definitions)
  4. Matching conditions: this section specifies the reference ID of the substructure matching function and its search options in case when they are different from the default substructure search settings (these override the default matching condition )

Evaluator Parameters

The evaluator parameter section currently sets the cache-mode attribute: if set to "true" then matching condition and plugin calculation results are cached in the molecule object and reused instead of performing the same structure search or chemical calculation repeatedly. The default is "false", since typically a Chemical Terms evaluation does not contain multiple references to the same matching condition or calculation and the caching procedure by itself also has some overhead.

Example:

<Params Cached="true"/>

Plugin Definitions

The plugin declarations enables different structure based chemical calculations (e.g. pKa, logP, logD) to be referenced in the expression strings.

Declaration

The plugin definition section contains the following data for each plugin reference that is to be used in the expressions:

  1. the plugin name which the plugin is referenced by in the expression;
  2. the plugin JAR relative to the marvin/plugins directory (marvin refers to Marvin istallation directory), where the plugin class should be loaded from (optional, loaded from the usual CLASSPATH if omitted);
  3. the plugin java class which wraps the plugin calculation into a prescribed frame (see Writing a Custom Plugin for details on how to wrap a calculation into a plugin);
  4. the plugin parameters as parameter name-value pairs - this section is optional: if omitted, the default plugin parameters are used.

The set of possible plugin parameters and a short description for each plugin can be seen with the help of the cxcalc program:

cxcalc <plugin> -h

where plugin is the plugin ID in the cxcalc configuration file. The parameter names used by the Evaluator are the long command line parameter names, without the starting '--' double dashes. For example, take pKa, type:

cxcalc pka -h

which prints out the following help text:

Calculator plugin: pka.
pKa calculation.
 
Usage:
  cxcalc [general options] [input files] pka
[pka options] [input files]
 
pka options:
  -h, --help       this help message
  -p, --precision  <floating point precision as number of
                   fractional digits: 0-8 or inf> default: 2
  -t, --type       [pKa|acidic|basic] (default: pKa)
  -m, --mode       [macro|micro] (default: macro)
  -n, --ions       max number of ionizable atoms to be considered (default: 8)
  -i, --min        min basic pKa (default: -10)
  -x, --max        max acidic pKa (default: 20)
  -a, --na         number of acidic pKa values displayed (default: 2)
  -b, --nb         number of basic pKa values displayed (default: 2)

The help, precision, na and nb parameters refer to display options, therefore these are not used by the Evaluator. Thus the parameter set for the pKa calculation in our case is:

type, mode, ions, min, max.

The same plugin can be used with different parameter settings if the XML configuration has more than one <Plugin> section with the same java class but different plugin names used to reference the plugins with each of the different parameter sections. In the following example the pKa1 name references pKa calculation with minimal basic pKa value -3 and maximal acidic pKa value 10 while the pKa2 name references pKa calculation with minimal basic pKa value -20 and maximal acidic pKa value 30. Different functions of a calculator plugin can be referenced by different IDs. In the example below, the "mass" result type of the ElemetalAnalyser plugin is referenced by the mass name, while the "exactmass" result type of the same plugin is referred by the exactmass name.

Example:

<Plugins>
    <Plugin ID="charge" 
        Class="chemaxon.marvin.calculations.ChargePlugin"
        JAR="ChargePlugin.jar"/>
    <Plugin ID="ioncharge" Class="chemaxon.marvin.calculations.IonChargePlugin">
	<Param Name="pH" Value="3.6"/>
        <Param Name="max-ions" Value="6"/>
	<Param Name="min-percent" Value="5"/>
	<Param Name="charge-type" Value="accumulated"/>
    </Plugin>
    <Plugin ID="microspecies" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin"/>
    <Plugin ID="pka" Class="chemaxon.marvin.calculations.pKaPlugin"/>
    <Plugin ID="pKa1" 
        Class="chemaxon.marvin.calculations.pKaPlugin">
	<Param Name="min" Value="-3"/>
	<Param Name="max" Value="10"/>
    </Plugin>
    <Plugin ID="pKa2" 
        Class="chemaxon.marvin.calculations.pKaPlugin">
	<Param Name="min" Value="-20"/>
	<Param Name="max" Value="30"/>
    </Plugin>
    <Plugin ID="logp" 
        Class="chemaxon.marvin.calculations.logPPlugin">
        <Param Name="type" Value="logPMicro"/>
    </Plugin>
    <Plugin ID="mass" 
        Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin">
	<Param Name="type" Value="mass"/>
    </Plugin>
    <Plugin ID="exactmass" 
        Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin">
	<Param Name="type" Value="exactmass"/>
    </Plugin>
    <Plugin ID="logp" Class="chemaxon.marvin.calculations.logPPlugin"/>
    <Plugin ID="logd" Class="chemaxon.marvin.calculations.logDPlugin"/>
    <Plugin ID="acc" Class="chemaxon.marvin.calculations.HBDAPlugin">
	<Param Name="type" Value="acc"/>
    </Plugin>
    <Plugin ID="don" Class="chemaxon.marvin.calculations.HBDAPlugin">
	<Param Name="type" Value="don"/>
    </Plugin>
    <Plugin ID="acceptorcount" Class="chemaxon.marvin.calculations.HBDAPlugin">
	<Param Name="type" Value="acceptorcount"/>
    </Plugin>
    <Plugin ID="donorcount" Class="chemaxon.marvin.calculations.HBDAPlugin">
	<Param Name="type" Value="donorcount"/>
    </Plugin>
</Plugins>

Function Definitions

The expression strings can also include references to predefined functions. These functions are implemented by java classes that have to implement the org.nfunk.jep.function.PostfixMathCommandI interface. See the JEP API Documentation for details.

Declaration

The function definition section contains the user-defined function implementation java classes accessible from the expressions. Each class is given an ID: this is the name that the function is referenced by from the expression. The Class attribute specifies the java class that implements the function. A predefined function may have preset parameters in a similar fashion as in the Plugin declaration section. Currently only the atomic property query function applies this for presetting the name of the atomic property to be queried.

Example:

    <Functions>
        <Function ID="array" Class="chemaxon.jep.function.IntArray"/>
        <Function ID="min" Class="chemaxon.jep.function.Min"/>
	<Function ID="max" Class="chemaxon.jep.function.Max"/>
	<Function ID="count" Class="chemaxon.jep.function.Count"/>
	<Function ID="sum" Class="chemaxon.jep.function.Sum"/>
	<Function ID="sortasc" Class="chemaxon.jep.function.SortAsc"/>
	<Function ID="sortdesc" Class="chemaxon.jep.function.SortDesc"/>
	<Function ID="in" Class="chemaxon.jep.function.In"/>
	<Function ID="eval" Class="chemaxon.jep.function.AtomEvaluatorFunction"/>
	<Function ID="filter" Class="chemaxon.jep.function.Filter"/>
	<Function ID="minatom" Class="chemaxon.jep.function.MinAtom"/>
	<Function ID="maxatom" Class="chemaxon.jep.function.MaxAtom"/>
	<Function ID="minvalue" Class="chemaxon.jep.function.MinValue"/>
	<Function ID="maxvalue" Class="chemaxon.jep.function.MaxValue"/>
	<Function ID="atomprop" Class="chemaxon.jep.function.AtomProperties"/>
	<Function ID="hcount" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="hcount"/>
	</Function>
	<Function ID="connections" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="connections"/>
	</Function>
	<Function ID="valence" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="valence"/>
	</Function>
	<Function ID="atno" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="atno"/>
	</Function>
	<Function ID="map" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="map"/>
	</Function>
	<Function ID="arom" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="arom"/>
	</Function>
    </Functions>

Matching Conditions

The matching condition declaration enables the Match function to be used in expression strings. This function performs substructure search and optionally checks for atom matching.

Declaration

The declaration gives a reference ID to the function, should contain a Class attribute which specifies the java class that implements the function, and can specify the search attributes in case when they differ from the default settings. Specifying search attributes is optional, if omitted then the default values are used. For a detailed description of the search options see the JChem Query Guide.

Search attributes that can be set in the Search section.
AttributeRangeDefault Value
StereoSearchtrue/falsetrue
DoubleBondStereoMatchingModenone/marked/allmarked
SubgraphSearchtrue/falsetrue
ExactAtomMatchingtrue/falsefalse
ExactStereoMatchingtrue/falsefalse
OrderSensitiveSearchtrue/falsefalse

Example:

<Matching ID="match" Class="chemaxon.jep.function.Match">
    <Search DoubleBondStereoMatchingMode="all" OrderSensitiveSearch="true"/>
</Matching>

A detailed description of the usage of the match function in expression strings is given below. A table of match function descriptions with examples is also available as a short reference.

Default function definitions, plugin definitions and matching conditions

Default plugin and function definitions as well as the default matching condition are read from the built-in evaluator.xml file located under the chemaxon/jep directory in marvinbeans.jar / jchem.jar provided by ChemAxon. Plugins, functions and matching conditions defined by the user are read from marvin/config/evaluator.xml file (where marvin is the Marvin istallation directory; in case of JChem it is the JChem installation directory) and from MARVIN_MAJOR_VERSION/evaluator.xml file (where MARVIN_MAJOR_VERSION is the major version of Marvin/JChem, e.g. "5.1") located under the .chemaxon (UNIX / Linux) or chemaxon (Windows) subdirectory in the user's home directory. The user defined XML configuration elements are added to default configuration, if both exist then user defined configuration override the built-in settings.

 

Usage Examples

  1. Calculates the molecule mass for the molecules in the target.sdf file where the mass calculator plugin is defined in the config.xml configuration file:
    evaluate -c config.xml -e "mass()" target.sdf
    
  2. Filters molecules with molecule mass at least 200, molecule mass is computed according to the default configuration:
    evaluate -e "mass() >= 200" -x sdf -o heavy.sdf target.sdf
    
  3. Evaluates the expression in file calc.txt for molecules in target.sdf, uses the default configuration:
    evaluate -e calc.txt target.sdf
    
  4. The same with SDF output into file with results written to the SDF tag RESULT, preceded by the expression string:
    evaluate -e calc.txt -S -i -t RESULT -o result.sdf target.sdf
    
  5. The same but the expression string is given in the expr.txt file:
    evaluate -e expr.txt -m query.sdf target.sdf
    
  6. Calculates partial charges for each atom with precision of 3 fractional digits, uses the charge calculation defined in config.xml:
    evaluate -c config.xml -e "charge()" -p 3 target.sdf
    
  7. The same with SDF file output with charge values written to the CHARGES SDF tag:
    evaluate -c config.xml -e "charge()" -p 3 -S -t CHARGE -o result.sdf target.sdf
    
  8. Enumerates atoms 1 and 2 of the Markush structure m.mrv, writes the resulting structures in MRV format:
    evaluate -e "markushEnumerations('1,2')" m.mrv -f mrv
    
  9. Returns 3 random enumerations of the Markush structure m.mrv, writes the resulting structures in MRV format, aligns scaffold and stores scaffold/R-group coloring data:
    evaluate -e "randomMarkushEnumerationsDisplay(3)" m.mrv -f mrv
    

    Note, that the display options (coordinates and attached coloring data) cannot be stored in the default SMILES output format, therefore it is necessary to specify the MRV output format in this case.

 
Copyright © 1999-2009 ChemAxon Ltd.    All rights reserved.