Figure 1. Acetic acid chemical graph representations
Distance geometry. In the ligand-receptor interaction mechanism, a ligand usually exhibits some degree of flexibility and thus the distances between atoms are not fixed.
One approach to cope with this drawback is to generate several conformations of low energy for each structure under consideration and then to compare all possible pairs of the resulting conformations. This approach is, however, computationally demanding, if exhaustive sampling of the molecules' conformational space is to be achieved and still cannot guarantee that the optimal similarity has been identified.
Distance geometry, herein considered, encodes the molecules' conformational flexibility within a single graphical representation by Crippen . Specifically, each edge of a 3D molecular graph is represented by a range of distances spanning the maximum and minimum allowable distances between two atoms. Distance ranges are imposed by some constrains, e.g., distance and chirality. The distance constrains are simply the lower and upper bounds of the interatomic distances; the chirality includes the handedness of the asymmetric centers in the molecule.
Covalent distance constrains. The local covalent structure of a molecule is easily defined by distance constrains. Unless one is dealing with a highly strained ring system, it is sufficient to use the exact distance constrains in which the lower and upper bounds are equal. For example, the distances among covalently bonded pairs of atoms, are determined with high precision by the bond order and the types of atoms connected. Similarly, the bond angles can usually be determined from the covalent structure, while for fixed bond lengths there is a one-to-one relation between the bond angle and the geminal distance, so that these distances can also be determined. The relation between the geminal distances and the bond angle q is given explicitly by the law of cosines:
where, l13 = |d12 - d23| and u13 = d12 + d13 are called the lower and upper triangle inequality limits, respectively.
Vicinal distance constrains. Similarly, when the incident and geminal distances are held fixed, there is a one-to-one relation between the absolute value of the torsion angle j and the vicinal distance, given by:
where l14 and u14 are cis and trans limits on the 1,4 distance.
Chirality constrains. The chirality c1234 of an ordered quadruple of points numbered 1,2,3,4 is given in terms of their Cartesian coordinates by the sign of the following determinant:
Torsion angle constrains. As shown above, the absolute value of a torsion angle can be constrained to any range of values by means of suitable 1,4 distance constraints, including its cis and trans limits. Moreover, since the chirality c1234 of a chain of four bonded atoms A1-A2-A3-A4 is equal to the sign of the torsion angle sgn(j) = 0,±1 about the 2,3 bond, by a suitable combination of distance and chirality constraints we can obtain any range of values with a given sign. This is sufficient to specify the rotameric state (gauche+, gauche- or anti) about the single bonds.
Steric distance constrains. Since two atoms cannot be in nearly the same place at the same time, in order to obtain reasonable conformations it is necessary to impose lower bound constraints on the distances between all pairs of atoms, separated by more than three bonds. For the sake of simplicity, these lower bounds are generally set to the sum of suitable hard sphere radii (van der Waals radii):
After applying the preceding distance and chirality constrains, we ensure that the structures which satisfy them are not grossly unreasonable on energetic grounds. In order to get the correct conformation, it is necessary to impose constrains on interatomic distances for atoms that are separated by four or more bonds. Such constrains are determined by bound smoothing procedures: the triangle bound smoothing and tetrangle bound smoothing.
Triangle bound smoothing. Triangle inequality bound smoothing is based upon the well-known triangle inequality among the distances:
for all triples of atoms i, j, k. It follows that if dik <= uik and djk <=ujk then:
So, if , uij > uik + ujk then uij > dij and hence uij can be replaced by the upper limit uik + ujk on dij without eliminating any conformations that satisfy the constrains dik <= uik and djk <= ujk (see Figure 2).
Figure 2. Triangle inequality.gif
In order to compute the triangle inequality limits efficiently, we can reduce their calculations to the all pairs, shortest path problem in a certain digraph. A path in such a digraph is a sequence of nodes such that any consecutive nodes in the sequence are connected by an arc in digraph (Figure 3), whose arrow points from the first to the second. It is easily seen that the upper triangle limits are equal to the lengths of the shortest path in an undirected digraph, whose arc lengths are equal to the given upper bounds. It can be shown that all the lower triangle limits are of the form:
where i, j, k, m are not necessarily distinct atom indices, lij and uij which respectively denote lower and upper bounds on the corresponding indexed atoms, and the overbars indicate the corresponding triangle inequality limits (eq. 5). This shows that the upper limits can be computed independently of the lower and also that the greatest lower limit cannot exceed the greatest lower bound.
Figure 3. The digraph whose shortest path determines the triangle inequality limits
Current implementation of the method uses Floyd's  shortest path algorithm. This algorithm takes each node k of the digraph in turn, and then makes a pass through all ordered pairs of other nodes (i, j). If the length of the path i ® k® j is shorter than the length of the direct path i ® j, the latter is set to the former. This ensures that after each pass all the path lengths are at least as short as any path that goes through node k, and hence iterating on this procedure for k = I, N produces the desired matrix of shortest paths.
Tetrangle inequality bound smoothing. Unfortunately, the triangle inequality limits represent a rather poor approximation to the actual Euclidean limits, so that the triangle inequality bound smoothing is not a very effective approach to locating errors in the bounds. A somewhat more effective (although much more time-consuming) approach looks at four atoms at a time, rather than three. In this case, the algebraic form of the relations among the distances is far more complicated, so that the tetrangle inequality limits are best described pictorially as in Figure 4.
Figure 4. Tetrangle inequality limits
The mathematical form of this inequality can be expressed in terms of Cayley-Menger determinants Easthope :
Thus the corresponding mathematical forms of the above tetrangle inequalities are:
The following pseudocode implements the procedures for triangle and tetrangle bound smoothing as described above.
Figure 5 illustrates a pair of structures and their corresponding 3D MCS; the initial coordinates are generated by Hyperchem molecular modeling package.
Figure 5. 3D MCS example, bold lines denote maximum common subgraphs
Upper and lower distance matrices for structure (a) Figure 5, before bound smoothing and after bound smoothing procedures are illustrated in Figure 6.
A significant improvement in upper and lower bound is observed after applying distance geometry bound smoothing procedure (Figure 6).
Figure 6. Upper and lower bound matrices, the upper (lower) diagonal values indicate upper (lower) bound distances, respectively
Once the set of structures to be queried has been preprocessed and stored using the upper and lower bound values calculated using the previously described distance bounding techniques, similarity searching is performed by using an MCS graph matching algorithm. The algorithm is a clique based method which computes the MCS by determining the maximum clique in the correspondence graph, this process is described in detail by Raymond [1, 2]. In the current approach the maximum clique algorithm as described in Ostergard  is used. The following pseudocode implements the maximum clique detection algorithm.
Figure 7. Structures of dopamine receptor antagonists
The test set used here was published by Brusniak it contains experimental data, log(1/Kd) where Kd is the dissociation constant of the receptor-antagonist complex (see Table 1). For this simulation, the most active compound ((R) SKF82526) was used as a query; 18 pairwise comparisons were performed, with distance tolerance value e = 0.1A. Initial coordinates are obtained by Hyperchem program, followed by a bound smoothing procedure. The 3D similarity threshold was set to 0.2 to prevent unnecessary pairwise comparisons. On a PC with 2.8 GHz Intel processor, 512 RAM, Windows XP, the computations for overall pairs comparisons took less than 2 s. The results are summarized in Table 1. It is a noticeable fact that the receptor can discriminate between stereo-isomers (R)-SKF-82526 and (S)-SKF82526, although the difference is only one carbon atom configuration; the procedure is discriminative, giving appropriate similarity index.
Table 1. Drug affinities and similarity calculation results
The similarity index values were calculated using a weighing atom scheme. Thus the atoms that ensemble the active scaffold necessary for a structure to be active have a higher rank than the irrelevant atoms. All the active structures contain this scaffold, so that, after the overlapping procedure, it is easy to identify these atoms.
Figure 8. Linear dependence between Similarity scores and BA
Linear regression analysis showed good correlations between similarity score and BA. Thus an attempt to give estimative values of BA for two compounds of interest (INDOMETHACIN and LSD) was made. The predicted values (by using regression eq. in Figure 8) showed mild activity of these known antagonists. Considering the simplicity of the used model, we can draw the conclusion that the similarity scores can classify correctly the unknowns, which is the most desirable feature (see Figure 8).
Figure 9. Pharmacophore map for dopamine receptor antagonists
The obtained results reveal the important structural features, i.e., the pharmacophore, of antagonists of the dopamine receptor (see Figure 9). In the high affinity compounds, the distance between the cationic nitrogen and the m-hydroxyl oxygen ranged from 6 to 6.45 A with highest activity compound (R)-SKF-82526 having a distance of 6 A. The distance between the cationic nitrogen and the first carbon in the second benzene cycle ranged from 3.78 to 3.81 A; in the lowest activity compounds, this pharmacophore is missing.