Codenames: smiles, smarts=smiles:s
Marvin imports and exports SMILES strings with the following specification rules:
Daylight's SMILES specification (3.1. SMILES Specification Rules) defines generic, unique, isomeric and absolute SMILES as:
Marvin generates always canonical SMILES with isomerism info if it is
possible to find out from the input file. The molecule graph is always
canonicalized using the algorithm in article [1]
but it is not guaranteed to give absolute SMILES for all isomeric
structures. With option u currently we are using an
approximation to make the SMILES string as absolute (unique for isomeric
structures) as possible. For correct exact (perfect) structure searching
MolSearch
and JChemSearch
classes of JChem
Base or the jc_equals
SQL operator of the JChem
Cartridge are suggested.
The initial ranks of atoms for the canonicalization are calculated
using the following atom invariants:
Agents are molecular structures that do not take part in the chemical reaction, but are added to the reaction equation for informative purpose only.
All of the above sections are optional. For example:
Marvin imports and exports SMARTS strings with the following features:
Implicit bond types: The default bond types for import and export strongly depend on the atoms connected by the bond.
f
{fFIELD1,fFIELD2,...}Import data fields from a multi-column file. The fields should be separated by tab character. The first column contains the SMILES/SMARTS strings, the second contains the data field called FIELD1, the third contains FIELD2, etc.
Example:molconvert sdf "foo.smi{fname,fID}"reads the smiles string, the name and the ID from the foo.smi file and converts it to sdf format.d
Import with Daylight compatiblity for query H.
In daylight smarts, H is only considered as H atom when the atom expression has the syntax [<mass>H<charge><map>] (mass, charge and map are optional). Otherwise it is considered as query H count.
Examples: [!H!#6] without d option is imported as an atom which is not H and not C. However with d option it is imported as an atom which has not one H attached, and which is not C.
Use "H1" or "#1" or "#1A" instead of "H" to avoid ambiguous meaning of H. "H1" always means query H count. "#1" always means H atom, "#1A" means aliphatic H atom.c
Ignore fixing of double bond stereo information in small rings, also ignore fixing of aromatic bonds to aliphatic if necessary.
Double bonds in small rings (ring size < 8) is imported automatically with CIS stereo information. If c options is set, the double bond stereo information is not changed to CIS during the import.
By default the bond is aromatic between two aromatic atom. But this is not true e.g. in case of biphenyl where the bond connecting the two aromatic ring is single. If biphenyl is represented with the SMILES string: "c1ccc(cc1)c1ccccc1" then it is necessary to set the bond between the two rings to single. If the molecule is exported by Chemaxon tools, the single bond between two aromatic atom is always explicitly written to avoid any confusion, so fixing aromatic bonds to aliphatic can be avoided.Z
Import compressed smiles. The compressed format must be specified expicitly, as it is not recognized by the importer automatically.
Export options can be specified in the format string. The format descriptor and the options are separated by a colon.
... Basic options for aromatization and H atom adding/removal. 0 Do not include chirality (parity) and double bond stereo (cis/trans) information.
Examples: "smiles:0" (not stereo), "smiles:a0" (aromatic, not stereo)q Obsolete option.
Atom equivalences are checked by default using graph invariants at double bonds.
Example: molconvert smiles -s "C/C=C(/C)C" results CC=C(C)Cri Smiles export rigorousness (i with the following values):
Example: Let m_1.mrv file contain the molecule CC=CC=CC=CC where the two side double bonds are in TRANS configuration but the middle one has no CIS, TRANS information (crossed double bond, or double bond with wiggly bond).
- Export the most information from the molecule to SMILES or SMARTS format. Don't check anything.
- Atoms, bonds and the molecule is checked for SMILES, SMARTS compatibility (default).
- In addition to the checks in case of value 5, double bonds in alternating single and double bond chain are checked for correct export.
molconvert smiles:r7 m.mrv will drop an Exception: "Nonstereo double bond between active CIS TRANS stereo bonds. Not possible to export it correctly to SMILES"
molconvert smiles m.mrv results C\C=C\C=C\C=C\C (which is incorrect in the sense that the middle bond became TRANS configuration).
s Write query smarts. (See query Smarts for details.) u Write unique smiles (considering chirality info also [2]). Note: Use this option if you want unique smiles export.
h Convert explicit H atoms to query hydrogen count. Tf1:f2:... Export f1, f2 ... SDF fields. The fields are separated by tab character.
If '-' is given before the T option like '-Tf1:f2:...' then no header line is written.t Export terminal atom with single_or_aromatic bond.
Examples: instead of [#6]-c1ccccc1 export the molecule to [#6]c1ccccc1
instead of [#6]-[#6] export the molecule to [#6][#6]n Export molecule name (the first line of an MDL molfile). Z Use compressed format, and compress the SMILES string. Note that the compressed format is not recognized by the import, so it should be specified explicitly.
[1] | SMILES 2. Algorithm for Generation of Unique SMILES Notation; D. Weininger, A. Weininger, J. L. Weininger; J. Chem. Inf. Comput. Sci. 1989, 29, 97-101 |
[2] | A New Effective Algorithm for the Unambiguous Identification of the Stereochemical Characteristics of Compounds During Their Registration in Databases; T. Cieplak and J.L. Wisniewski; Molecules 2001, 6, 915-926 |
™: SMILES, SMARTS, and SMIRKS are trademarks of Daylight Chemical Information Systems.