Return

Similarity searching in databases of 2D chemical structures is widely used for virtual screening and lead discovery. A similarity measure, that quantifies the degree of structural resemblance between the target structure and each of the database structure, is based on fingerprint or molecular descriptor encoding of the molecular structure with similarity between pairs of such representations being computed using the

The procedures and algorithms used in detection of the MCS can be extended to 3D similarity searching, with several modifications, the most important one being conformational flexibility in the matching algorithm. This study presents a complete implementation of such an algorithm. The effectiveness of the proposed method in classifying chemical structures with respect to a given bioactive leader is evaluated.

Figure 1. Acetic acid chemical graph representations

**Distance geometry.** In the ligand-receptor interaction mechanism, a ligand usually exhibits some degree of flexibility and thus the distances between atoms are not fixed.

One approach to cope with this drawback is to generate several conformations of low energy for each structure under consideration and then to compare all possible pairs of the resulting conformations. This approach is, however, computationally demanding, if exhaustive sampling of the molecules' conformational space is to be achieved and still cannot guarantee that the optimal similarity has been identified.

Distance geometry, herein considered, encodes the molecules' conformational flexibility within a single graphical representation by Crippen [3]. Specifically, each edge of a 3D molecular graph is represented by a range of distances spanning the maximum and minimum allowable distances between two atoms. Distance ranges are imposed by some constrains, e.g., distance and chirality. The distance constrains are simply the lower and upper bounds of the interatomic distances; the chirality includes the handedness of the asymmetric centers in the molecule.

**Covalent distance constrains.** The local covalent structure of a molecule is easily defined by distance constrains. Unless one is dealing with a highly strained ring system, it is sufficient to use the exact distance constrains in which the lower and upper bounds are equal. For example, the distances among covalently bonded pairs of atoms, are determined with high precision by the bond order and the types of atoms connected. Similarly, the bond angles can usually be determined from the covalent structure, while for fixed bond lengths there is a one-to-one relation between the bond angle and the geminal distance, so that these distances can also be determined. The relation between the geminal distances and the bond angle q is given explicitly by the law of cosines:

where, *l*_{13} = |*d*_{12} - *d*_{23}| and *u*_{13} = *d*_{12} + *d*_{13} are called the *lower* and *upper* triangle inequality limits, respectively.

**Vicinal distance constrains.** Similarly, when the incident and geminal distances are held fixed, there is a one-to-one relation between the *absolute value* of the torsion angle j and the vicinal distance, given by:

where *l*_{14} and *u*_{14} are *cis* and *trans* limits on the *1,4* distance.

**Chirality constrains.** The chirality c_{1234} of an ordered quadruple of points numbered *1,2,3,4* is given in terms of their Cartesian coordinates by the sign of the following determinant:

**Torsion angle constrains.** As shown above, the absolute value of a torsion angle can be constrained to any range of values by means of suitable 1,4 distance constraints, including its *cis* and *trans* limits. Moreover, since the chirality c_{1234} of a chain of four bonded atoms *A*_{1}-*A*_{2}-*A*_{3}-*A*_{4} is equal to the sign of the torsion angle sgn(j) = 0,±1 about the 2,3 bond, by a suitable combination of distance and chirality constraints we can obtain any range of values with a given sign. This is sufficient to specify the rotameric state (*gauche*^{+}, *gauche*^{-} or *anti*) about the single bonds.

**Steric distance constrains.** Since two atoms cannot be in nearly the same place at the same time, in order to obtain reasonable conformations it is necessary to impose lower bound constraints on the distances between all pairs of atoms, separated by more than three bonds. For the sake of simplicity, these lower bounds are generally set to the sum of suitable *hard sphere radii* (van der Waals radii):

After applying the preceding distance and chirality constrains, we ensure that the structures which satisfy them are not grossly unreasonable on energetic grounds. In order to get the correct conformation, it is necessary to impose constrains on interatomic distances for atoms that are separated by four or more bonds. Such constrains are determined by *bound smoothing* procedures: the *triangle bound smoothing* and *tetrangle bound smoothing*.

**Triangle bound smoothing.** Triangle inequality bound smoothing is based upon the well-known triangle inequality among the distances:

for all triples of atoms *i*, *j*, *k*. It follows that if *d _{ik}* <=

So, if ,

Figure 2. Triangle inequality.gif

In order to compute the triangle inequality limits efficiently, we can reduce their calculations to the all pairs, shortest path problem in a certain digraph. A path in such a digraph is a sequence of nodes such that any consecutive nodes in the sequence are connected by an arc in digraph (Figure 3), whose arrow points from the first to the second. It is easily seen that the upper triangle limits are equal to the lengths of the shortest path in an undirected digraph, whose arc lengths are equal to the given upper bounds. It can be shown that all the lower triangle limits are of the form:

where *i*, *j*, *k*, *m* are not necessarily distinct atom indices, *l _{ij}* and

Figure 3. The digraph whose shortest path determines the triangle inequality limits

Current implementation of the method uses Floyd's [7] shortest path algorithm. This algorithm takes each node *k* of the digraph in turn, and then makes a pass through all ordered pairs of other nodes (*i*, *j*). If the length of the path *i* ® *k*® *j* is shorter than the length of the direct path *i* ® *j*, the latter is set to the former. This ensures that after each pass all the path lengths are at least as short as any path that goes through node *k*, and hence iterating on this procedure for *k* = * I, N* produces the desired matrix of shortest paths.

Figure 4. Tetrangle inequality limits

The mathematical form of this inequality can be expressed in terms of *Cayley-Menger* determinants Easthope [4]:

Thus the corresponding mathematical forms of the above tetrangle inequalities are:

together with

The following pseudocode implements the procedures for triangle and tetrangle bound smoothing as described above.

Figure 5 illustrates a pair of structures and their corresponding 3D MCS; the initial coordinates are generated by *Hyperchem* molecular modeling package.

Figure 5. 3D MCS example, bold lines denote maximum common subgraphs

Upper and lower distance matrices for structure (**a**) Figure 5, before bound smoothing and after bound smoothing procedures are illustrated in Figure 6.

A significant improvement in upper and lower bound is observed after applying distance geometry bound smoothing procedure (Figure 6).

Figure 6. Upper and lower bound matrices, the upper (lower) diagonal values indicate upper (lower) bound distances, respectively

Once the set of structures to be queried has been preprocessed and stored using the upper and lower bound values calculated using the previously described distance bounding techniques, similarity searching is performed by using an MCS graph matching algorithm. The algorithm is a clique based method which computes the MCS by determining the maximum clique in the correspondence graph, this process is described in detail by Raymond [1, 2]. In the current approach the maximum clique algorithm as described in Ostergard [8] is used. The following pseudocode implements the maximum clique detection algorithm.

Validation studies were carried out on a set of 19 dopamine receptor antagonists. Dopamine receptors in the brain are important in modulating motor, endocrine, and emotional functions by Strange [5] and Waddington [6]. The antagonist affinity was measured at recombinant receptors selectively expressed in cloned cells (Figure 7).

Figure 7. Structures of dopamine receptor antagonists

The test set used here was published by Brusniak[7] it contains experimental data, log(1/*K _{d}*) where

Table 1. Drug affinities and similarity calculation results

Molecule | log(1/K)_{d} | Similarity |
---|---|---|

(+)-6,7-ADTN | -3.66 | 0.7500 |

adrenaline | -4.74 | 0.4444 |

Cl-APB | -1.92 | 0.9907 |

DHX | -3.08 | 0.8102 |

dopamine | -3.39 | 0.6759 |

m-tyramine | -4.68 | 0.5833 |

noradrenaline | -4.69 | 0.6759 |

p-tyramine | -5.59 | 0.5880 |

(R)-apomorfine | -2.83 | 0.7685 |

(R)-(-)-NPA | -3.26 | 0.7546 |

(R)-(+)-6-Br-APB | -2.58 | 0.9861 |

(S)-(+)-apomorfine | -3.08 | 0.7639 |

S-(+)-NPA | -3.72 | 0.7639 |

(S)-SKF82526 | -3.26 | 0.7778 |

serotonin | -3.99 | 0.6806 |

SKF38393 | -2.18 | 0.9861 |

(R)-SKF-82526^{a} | -1.45 | 1.0000 |

INDOMETHACIN^{b} | -3.53 | 0.7479 |

LSD^{**} | -4.02 | 0.6818 |

The similarity index values were calculated using a weighing atom scheme. Thus the atoms that ensemble the active scaffold necessary for a structure to be active have a higher rank than the irrelevant atoms. All the active structures contain this scaffold, so that, after the overlapping procedure, it is easy to identify these atoms.

Figure 8. Linear dependence between Similarity scores and BA

Linear regression analysis showed good correlations between similarity score and BA. Thus an attempt to give estimative values of BA for two compounds of interest (INDOMETHACIN and LSD) was made. The predicted values (by using regression eq. in Figure 8) showed mild activity of these known antagonists. Considering the simplicity of the used model, we can draw the conclusion that the similarity scores can classify correctly the unknowns, which is the most desirable feature (see Figure 8).

Figure 9. Pharmacophore map for dopamine receptor antagonists

The obtained results reveal the important structural features, *i.e.*, the *pharmacophore*, of antagonists of the dopamine receptor (see Figure 9). In the high affinity compounds, the distance between the cationic nitrogen and the m-hydroxyl oxygen ranged from 6 to 6.45 A with highest activity compound (R)-SKF-82526 having a distance of 6 A. The distance between the cationic nitrogen and the first carbon in the second benzene cycle ranged from 3.78 to 3.81 A; in the lowest activity compounds, this pharmacophore is missing.

[ 2] J. Raymond, E. Gardiner, P. Willett,

[ 3] G. Crippen, T. Havel,

[ 4] P. Easthope, T. F. Havel,

[ 5] P. G. Strange,

[ 6] J. Waddington,

[ 7] R. Floyd,

[ 8] R. J. Ostergard,

Return