Homology groups in Markush structures

Currently JChem enables the searching of Markush structures containing homology groups only with specific molecule queries (with no query features). Homology groups are supported only on the target side; query side support will be implemented in a future version, as well as properties of the homology groups constraining the possible structures.

Search options and properties

Search options regulating the search behaviour are also available:

Search Options

Currently there is one regulating option: 'completeHG', which specifies if the part of the query side structure matching on the given group should represent an entire homology group or if substructures are also accepted.
For example, if completeHG is set to true (default) an alkyl chain can't match on a cycloalkyl group, only a ring (system). The detailed behaviour is found at the definition of the groups. CCCC1CCCC2=CC(O)CCC12 matches on cycloalkyl-O but CCCC(CC)C(C)CCCO not If completeHG is set to false, both are matching.

Properties of homology groups (a future development)

These will be implemented in a future version. They regulate the properties of the individual homology groups independently. They include the followings (with the group to which it may be applied):
  1. monocyclic/fused, default: both are allowed, for all ring type groups
  2. saturated, default false, for cycloalkyl and heterocycle groups
  3. chain, default false, for alkyl, alkenyl, alkynyl

Definition of the groups

The homology groups in Markush structures are represented by pseudo atoms labelled with the common chemical annotation of the groups. The names are case insensitive. The pseudo atoms can be most easily drawn in Marvin Sketch using the Homology Groups template group.

There are two major types of homology groups regarding their way of definition:
  1. Built-in groups are defined by specific structural properties of the group. These groups are not enumerated during searching, but the query structure is recognized as fulfilling the requirements for such a structure. The possible number of covered structures is usually infinite, unless the number of atoms is limited. Examples of built-in groups are alkyl, aryl, heterocycle, etc.
  2. User-defined groups are explicitly defined and only the listed structures can match these homology groups. The definition is given in the form of an R-group definition, and any of the generic features discussed in the Markush chapter can be used in the definition. These definitions can be customized by the user, and may be context-specific. (E.g. protecting group definition depends on which functional group it is protecting.)

Built-in groups

Group name Compulsory Optional Incomplete case JChem's group name
Alkyl - minimum of one carbon atom
- only carbon and hydrogen atoms
- single bonds
- no ring bonds
substitution at arbitrary position(s) same requirements alkyl
Alkenyl - at least one double bond
- minimum of 2 carbon atoms
- otherwise same as for Alkyl
single and double bonds both allowed the matching structure doesn't need to have any double bond alkenyl
Alkynyl - at least one triple bond
- minimum of 2 carbon atoms
- otherwise same as for Alkyl
double bond the matching structure doesn't need to have any triple bond alkynyl
Cycloalkyl - monocyclic or fused aliphatic rings
- only carbon and hydrogen atoms
- substitution by (saturated) alkyl chains
- double or triple bonds in the ring but not aromatic
- several connection points but all must be on a ring (can't have external connection on an alkyl chain)
- any carbon structure without aromatic bonds
- the substituting alkyl chain can be unsaturated
cycloalkyl
Aryl - monocyclic or fused aliphatic rings
- among this rings at least one aromatic
- only carbon and hydrogen atoms
- substitution by (saturated) alkyl chains
- double bonds/triple bonds in the aliphatic rings
- several connection points but all must be on an aromatic ring (can't have external connection on an alkyl chain or on an aliphatic ring)
- similar to cycloalkyl but the atoms can have aromatic bonds
- any carbon structure
- the matching structure doesn't need to have a ring.
aryl
Heterocycle - similar to a cycloalkyl but the ring(system) should contain at least one hetero atom
- no hetero atoms outside the rings
same as cycloalkyl similar to cycloalkyl but here hetero atoms are accepted as well, which means any structure without aromatic bonds heterocycle
Heteroaryl - similar to aryl but the ring(system) should contain at least one hetero atom.
- no hetero atoms outside the rings
same as aryl Similar to aryl but here hetero atoms are accepted as well. Condition for the externally connection atom holds as in case of aryl. heterocycle
Unknown group - Any structure. Unknown structures are not enumerated during Markush enumeration. - unknown

User-defined homology groups

These homology groups are predefined and are represented by R-group definitions. During search the pseudo atoms are translated to the corresponding R-group definitions.

The group definitions are customizable, the user can override them or can make new definitions as well. Group names are treated case insensitive, but in case sensitive file systems the definition files should be lowercase.

Technically, the group definitions are handled in a conversion step before the search, in which the homology group is replaced by the R-atom. This atom receives an alias string, which shows the name of the converted homology group, and the R-group index. Internally this alias helps to distinguish between originally existing R-groups and R-groups resulting from homology conversion.

User defined groups readily available in the system are the following:

Halogen

Halogen elements: F, Cl, I and Br.
JChem's group name: halogen or X

Protecting

Protecting groups' definition file contains several definitions, each for protecting different functional groups. The protected functional group is defined by the neighbourhood of the R-atom. When the R-atom has the same neighbourhood as the "protecting" pseudo atom, then the group is replaced by the R-atom.

The conversion processes the group definitions in their order in the file. This means that more specific environments should be placed earlier. For example, a carboxyl protecting group definition should precede an alcohol definition, otherwise the alcohol definitions will be applied instead.

Currently the system can't handle protecting groups having more than one attachment points, or groups where the heavy atoms of the functional group should be changed by the substitution. The readily available definitions contain amine, carboxyl and hydroxyl protecting groups.
JChem's group name: protecting

Acyl

Residue left after removal of one or more OH groups from an acid. Currently it's implemented only for one attachment point.
JChem's group name: acyl

Enumeration

To enable the enumeration of homology groups, the homology enumeration option of Markush enumeration has to be switched on. Otherwise the homology groups are kept as pseudo atoms. This latter option might be useful for showing that these structures can't be fully enumerated.

Built-in groups

For the built-in types, example R-group definitions specify the enumerable library, with the same technology as user-defined groups. These structures are characteristic to the homology group and encompass simple and large structures as well. The group definitions are customizable.

We have to emphasize, that these definitions do not affect searching. As noted earlier, arbitrary structures fulfilling the requirements for the homology group will match such a target.

User-defined groups

The enumeration of user defined homology groups use the same (customizable) R-group definitions as searching.