Currently JChem enables the searching of Markush structures containing homology groups only with specific molecule queries (with no query features). Homology groups are supported only on the target side; query side support will be implemented in a future version, as well as properties of the homology groups constraining the possible structures.
The "Example" column shows complete structures representing the homology groups.
Table 1. Built-in groups
Group name (alias names) | Compulsory | Optional | Incomplete case | Example |
---|---|---|---|---|
Alkyl (chk) |
- minimum of one carbon atom - only carbon and hydrogen atoms - single bonds - no ring bonds |
substitution at arbitrary position(s) | same requirements | |
Alkenyl (che) |
- at least one double bond - minimum of 2 carbon atoms - otherwise same as for Alkyl |
same as above | same as at compulsory, but the matching structure does not need to have any double bond | |
Alkynyl (chy) |
- at least one triple bond - minimum of 2 carbon atoms - otherwise same as for Alkyl |
same as above, double bond | same as at compulsory, but the matching structure does not need to have any triple bond | |
Cycloalkyl (cyc) |
- monocyclic or fused aliphatic rings - only carbon and hydrogen atoms |
- substitution by (saturated) alkyl chains - double or triple bonds in the ring but not aromatic - several connection points but all must be on a ring (can't have external connection on an alkyl chain) |
- any carbon structure without aromatic bonds - the substituting alkyl chain can be unsaturated |
|
Aryl (ary) |
- monocyclic or fused rings - among this rings at least one should be aromatic - only carbon and hydrogen atoms |
- substitution by (saturated) alkyl chains - double bonds/triple bonds in the aliphatic rings - several connection points but all must be on an aromatic ring (can't have external connection on an alkyl chain or on an aliphatic ring) |
- similar to cycloalkyl but the atoms can have aromatic bonds: - any carbon structure where the external connection is on an atom that has aromatic bond or has only one bond. - the matching structure doesn't need to have a ring. |
|
AliphaticHeterocyclyl (het, heterocycle, heterocyclyl) |
- similar to a cycloalkyl but the ring(system) should
contain at least one hetero atom - no hetero atoms outside the rings |
same as cycloalkyl | similar to cycloalkyl but here hetero atoms are accepted as well, which means any structure without aromatic bonds | |
Heteroaryl (hea) |
- similar to aryl but the aromatic ring(system) should contain at least one
hetero atom. - no hetero atoms outside the rings |
same as aryl | Similar to aryl but here hetero atoms are accepted as well. Condition for the externally connecting atom holds as in case of aryl. | |
FusedHetero (hef) |
- Fused ringsystem having at least one hetero atom. | same as aryl | Any structure having hetero, carbon and hydrogen atoms, with any bonds. | |
Unknown group (unk) |
- | Any structure. Unknown structures are enumerated as the union of all other homology groups. | - | - |
Metal (mx) |
Any metal | - | - | U, K, Fe, Na, Ni, Al |
AlkaliMetal (amx) |
Alkali and alkaline earth metals | - | - | Na, K, Ca, Mg |
OtherMetal (a35) |
Group IIIa-Va metals | - | - | Al, Ga |
TransitionMetal (trm) |
transition metals excluding Lanthanum | - | - | Fe, Ni, Zn, Co, Hg, W |
Lanthanide (lan) |
lanthanides (including Lanthanum) | - | - | Nd, Ce, Pr |
Actinide (act) |
actinides (including Actinium) | - | - | U, Th, Pa |
These homology groups are predefined and are represented by R-group definitions. During search the pseudo atoms are translated to the corresponding R-group definitions.
The group definitions are customizable, the user can override them or can make new definitions as well. Group names are treated case insensitive, but in case sensitive file systems the definition files should be lowercase.
User defined groups readily available in the system are the following:
Protecting groups' definition file contains several definitions, each for protecting different functional groups. The protected functional group is defined by the neighbourhood of the R-atom. When the R-atom has the same neighbourhood as the "protecting" pseudo atom, then the group is replaced by the R-atom.
The conversion processes the group definitions in their order in the file. This means that more specific environments should be placed earlier. For example, a carboxyl protecting group definition should precede an alcohol definition, otherwise the alcohol definitions will be applied instead. Currently they are located in the following order:
Currently the system can't handle protecting groups having more than one
attachment point, or groups where the heavy atoms of the functional group
should be changed by the substitution.
The readily available definitions contain amine, carboxyl and hydroxyl
protecting groups.
JChem's group name: protecting
alias names: prt
Some examples with different functional groups protected can be found on Table 2.
Table 2. Protecting group examples
Protecting group | Represented examples | ||
Residue left after removal of one or more OH groups from an acid. Currently it behaves
as simple pseudo atoms: can only be matched by itself and is not enumerated. This behaviour
complies with the Thomson-Reuters/Questel acyl group handling.
JChem's group name: acyl
alias names: acy
The union of all other homology groups except acyl, unknown and protecting.
JChem's group name: any
alias names: xx
Currently there is one regulating option: 'completeHG', which specifies if the
part of the query side structure matching on the given group should represent an
entire homology group or if substructures are also accepted. Of course in the incomplete
case an entire structure can also match on the given homology group.
For example, if completeHG is set to true (default) an alkyl chain can't match on a cycloalkyl
group, only a ring (system). The detailed behaviour is found at the definition of the groups.
And example is shown on Table 3.
Table 3. Complete and incomplete structures of homology groups
target | query | hit | |
completeHG:y | completeHG:n | ||
For the built-in types, example R-group definitions specify the enumerable library, with the same technology as user-defined groups. These structures are characteristic to the homology group and encompass simple and large structures as well. The group definitions are customizable.
We have to emphasize, that these definitions are only used for enumeration and do not affect searching. As noted earlier, arbitrary structures fulfilling the requirements for the homology group will match such a target.
The enumeration definitions contain as default two attachment points. After enumeration these are the atoms which connect to the first two neighbours of the group. If the enumerated homology pseudo atom has more than two connections then further attachment points are added. These are put on atoms that have free valence and comply the requirements for externally connecting atoms of the given group. E.g. for aryl only aromatic ring atoms can be connection points of the aryl. The atoms of the definition are investigated in the order of their numbering. If a definition does not have the sufficient number of such atoms, then it is rejected. In case all the definitions of a homology atoms are rejected an exception is thrown showing that the given homology group does not have any suitable enumeration definition.
The enumeration of user defined homology groups use the same (customizable) R-group definitions as searching. The user-defined homology atom should have the same number of connections as are shown in the definitions.
Table 4. Overriding amino protecting group definitions.
overwriting the definition | sample markush file | enumerations |