Identification of Dopamine D1 Receptor Agonists and Antagonists under Existing Noise Compounds by TFS-based ANN and SVM

Yoshimasa TAKAHASHI, Satoshi FUJISHIMA, Katsumi NISHIKOORI, Hiroaki KATO and Takashi OKADA


1 Introduction

Predictive identification of possible biological activity of chemical compounds is a very attractive field. Many efforts have been devoted to this problem. These works are closely related to structure-activity studies. There are two different viewpoints for the problem. One is the prediction of potency of action. For the case the approaches are referred as Quantitative Structure- Activity Relationships (QSAR). In the past years these approaches were successfully applied to a wide variety of problems. And it is still under active development. Another viewpoint is computational estimation of active classes of drug molecules. Many of works are based on structural similarity analysis. Structural similarity provides us a lot of information on structure-activity and structure-property problems [1, 2], and for the selection of candidate analogs as new chemicals [3 - 5]. The basic idea behind them is that structurally similar compounds are likely to possess similar molecular properties and similar biological activities. Most of the approaches for the evaluation are based on finding particular functional atoms or atomic groups defined in advance. However, the result of such a structural similarity analysis depends on the set of substructures defined as descriptors [6]. The authors proposed a Topological Fragment Spectral (TFS) method as a tool for the description of the topological structure profile of a molecule [7]. The TFS representation method doesn't require any kind of a priori substructure definition like a dictionary file of substructures to be searched. The method can be applied to structural feature analysis and similarity analysis. It is also useful for the similar structure searching on chemical structure databases [8] and visualization of similar structure data space [9].
In our preceding works [10], we reported that an artificial neural network (ANN) approach combined with the TFS as input signals allowed us to successfully classify the type of activities for dopamine receptor antagonists that interact with four different types of dopamine receptors, and that it could be applied to the prediction of active class for the test compounds. It was also shown that the support vector machine (SVM) works for this problem much better [11]. Those were the results obtained with a set of chemicals that belong to any of the typical activity classes without noise compounds. However, from the viewpoint of practical usage, we have to treat a lot of chemicals that belong to particular active classes and much more chemicals that never belong to any of them of our interest. In the present work, we investigated to identify the pharmacological activity of drugs under the condition with the existence of many inactive compounds that are regarded as noise data.

2 Data Sets and Methods

2. 1 Data Sets

In this work we employed 232 drugs that interact with dopamine D1 receptor. They were taken from MDDR [12] which is a structure database of investigative new drugs. Sixty-three of them are the agonists and 169 are the antagonists. In addition, 696 compounds were randomly chosen from the MDDR database excepting the dopamine D1 receptor actives. They were used as noise data for this work. Three data sets that have different sizes of noise data (50% for trial set 1, 100% for trial set 2, and 300% for trial set 3 in noise rate against 232 drugs of the interests) were prepared for the following analyses. For the training of each model and the validation, 10 percent of the compounds of a data set were randomly chosen, and they were kept for the test (as a prediction set) in every case. The compounds remaining were used as the training set in the case.

2. 2 Numerical Representation of Structural Features of Chemicals

In the present work, to describe structural information of drugs, Topological Fragment Spectra (TFS) method [1] was employed. The TFS is based on enumeration of all the possible substructures from a chemical structure and numerical characterization of them. A chemical structure can be regarded as a graph in terms of graph theory. For graph representation of chemical structures, hydrogen suppressed graph is often used.
To get a TFS representation of a chemical structure, all the possible subgraphs with the specified number of edges are enumerated. Subsequently, every subgraph is characterized with a numerical quantity. For the characterization of a subgraph we used the overall sum of the mass numbers of the atoms corresponding to the vertexes of the subgraphs. In this characterization process, suppressed hydrogen atoms are taken into account as augmented atoms. The histogram is defined as a TFS that is obtained from the frequency distribution of a set of individually characterized subgraphs (i.e. substructures or structural fragments) according to the value of their characterization index. An illustrative scheme of TFS creation from a chemical structure is given in Figure 1. Another example of TFS of promazine characterized according to the present method is shown in Figure 2. The TFS can be regarded as a function of chemical structure, i.e. TFS= f(chemical structure).
The TFS generated along with this manner is a digital representation of topological structural profile of a drug molecule. This is very similar to that of mass spectra of chemicals. The computational time required for the exhaustive enumeration of all possible substructures is often very large especially for the molecules that involve highly fused rings. To avoid such a problem the use of subspectrum was employed for the present work, in which each spectrum could be described with structural fragments up to a specified size in the number of edges (bonds).

Figure 1. A schematic flow of TFS creation. S(e) is the number of edges (bonds) of fragments to be generated.

Figure 2. TFS of promazine characterized by the sum of atomic mass numbers for each fragment.

Obviously, the fragment spectrum obtained by these methods can be described as a multidimensional pattern vector. However, the number of dimensions of the TFS pattern description vector depends on individual chemical structures. The different dimensionalities of the spectra to be compared are adjusted as follows,

Where, xik is the intensity value of peak k of TFS for i-th molecule, and xjk is that of peak k of TFS for the j-th molecule having the highest value of the characterization index (in this work, the highest fragment mass number).
In this manner, TFS for every trial set were generated. The dimensions of the TFS for trial set 1, trial set 2 and trial set 3 were 168, 166, and 186 respectively. For the prediction, each TFS is adjusted by padding with 0 or by cutting the higher mass region off to have the same dimensionality as that of the training set when a prediction sample is submitted.

2. 3 Artificial Neural Network (ANN)

Discrimination of pharmacological activity classes of the drugs was investigated using ANN. A three-layer learning network with a complete connection among layers was used. The TFS was submitted to the ANN as input signals for the input neurons. Training of the ANN was carried out by error back propagation method. All the neural network analyses were carried out using a computer program, NNQSAR, developed by the authors [13]. For the present work, the number of neurons in the hidden layer of the TFS/ANN model was set at two to avoid explicit over-fitting because of a large number of input neurons for accepting a TFS pattern. In the output layer two neurons were set for the agonists and the antagonists, because it is difficult to consider that noise compounds form a particular single class. For the classification, the following rules were employed.
Classification rules:
  1. If the neuron for agonists is active and that for antagonists is inactive then the input molecule is classified as an agonist.
  2. If the neuron for antagonists is active and that for agonists is inactive then the input molecule is classified as an antagonist.
  3. If both of the neuron for agonists and that for antagonists are inactive then the input molecule is classified as a noise compound.
  4. If both of the neuron for agonists and that for antagonists are active then the input molecule is classified into a class of which the neuron gives larger response than the other.
ANN parameters used for the ANN modeling were determined to give the best prediction in heuristic manner.

2. 4 Support Vector Machine (SVM)

The SVM implements the following basic idea: it maps the input vectors x into a higher dimensional feature space z through some nonlinear mapping, chosen a priori. In this space, an optimal discriminant surface with maximum margin is constructed. Given a training dataset represented by X(x1,...,xi,...,xn), xi, that are linearly separable with class labels yi Î{-1, 1},i = 1,...,n, the discriminant function can be described as the following equation.

Where w is a weight vector, b is a bias. The discriminant surface can be represented as f(xi)=0 . The maximum margin can be obtained by minimizing the square of the norm of weight vector w,

The decision function is described as S(x)=sign(w.x+ b) for classification, where sign is a sign function that returns 1 for positive value and -1 for negative value. This basic idea can be extended to a linearly inseparable case by introducing slack variables x and minimizing the following quantity,

This optimization problem reduces to the previous one for separable data when constant C is large enough. This quadratic optimization problem with constraints can be reformulated by introducing Lagrangian multipliers a.

Since the training points xi do appear in the final solution only via dot products, this formulation can be extended to general nonlinear functions by using the concepts of nonlinear mappings and kernels [14]. Given a mapping, x®f(x), the dot product in the final space can be replaced by a kernel function.

Here we used radial basis function as the kernel function for mapping the data into a higher dimensional space.

Basically, the SVM is a binary classifier. For classification problems of three or more categorical data, plural discrimination functions are required for the current multi categorical classification. In this work, a one-against-the-rest approach was used for the case. The TFS were used as input feature vectors to the SVM. All the TFS-based SVM analyses were carried out using a computer program developed by the authors according to Platt's algorithm [15]. In the present work, C=100 and (s=40 were used for the training. The values of these parameters were determined to give the best prediction in heuristic manner.

3 Results and Discussion

3. 1 Classification and Prediction by TFS/ANN

The classification and prediction abilities of the TFS/ANN were investigated for dopamine D1 receptor agonists (63 compounds), antagonists (169 compounds) and noise compounds. Three data sets that have different sizes of noise data (50%, 100% and 300% in noise rate) were used for the present computational experiments. For the first two cases, noise compounds for each data set were randomly chosen from the noise compounds of 696 prepared in advance. Each dataset was divided into two groups, 90% of the data for a training set and 10% of them for a prediction set. ANN models of 168×2×2, 166×2×2 and 186×2×2 were trained for those datasets respectively. Every sample was designated into a single class according to the classification rules mentioned in the previous section. The parameters resulted for the training were the learning rate of 0.05, the slope factor of sigmoid function of 1.0, the momentum constant of 0.02 and the number of trainings of 10000 for the 50%-noise data set. For the 100%-noise dataset they were 0.05, 1.0, 0.02 and 20000, and for the 300%-noise data set were 0.04, 1.0, 0.02, and 20000, respectively. The values of RMS resulting for the cases were 0.226, 0.222 and 0.208 respectively. The results for the trainings and the predictions are summarized in Table 1 and Table 2.

Table 1. Results of the training by TFS/ANN
%NoiseTraining (%)
50%279/313 39/56147/15293/105

Table 2. Results of the prediction by TFS/ANN
%NoisePrediction (%)

The TFS/ANN models classified 89.1%-91.3% of the drugs into their own classes correctly. However, the details of the results show that the recognition rate for individual class differs from each other. This matter is typical for the data set with 300% noise. It is considered that the larger number of samples for each class give us the better recognition rate in the training. The matter is true in the prediction by the models obtained. The prediction results are summarized in Table 2. Both Table 1 and Table 2 show that the results for agonists are poorer than those for other classes in both cases of training and prediction. It is considered that the TFS/ANN model couldn't learn very much for the training set because the number of samples is relatively smaller than those of the other sets. The present results suggest that the training results with artificial neural network considerably depend on the sample size in each class.

3. 2 Classification and Prediction by TFS/SVM

Next, we investigated the classification and prediction abilities of the TFS/SVM for the same data sets used in the previous section. One-against-the-rest approach was used for the classification. In this approach, a certain active class of interest is discriminated from the other classes. However, as already mentioned above, it is difficult to consider the noise compounds as a single class because they are heterogeneous. Therefore, two independent SVM discriminant functions were developed and used for the present analysis. The classification was carried out in the similar way in order to compare the result with that of the TFS/ANN. When a sample was positive to each of two classes (agonist and antagonist), the sample was designated into a class of which the discriminant function gave the larger value (the distance from the margin boundary surface). When a sample was negative to both of the classes the sample was classified as a noise compound. The results for all the training sets are shown in Table 3. The TFS/SVM models classified 99.4%-99.9% of the drugs into their own classes correctly. The details of the classifications show that the recognition rate for individual classes is very good for every class regardless of the sample size of individual classes. The matter is still true even for the data set with 300% noise.

Table 3. Results of the training by TFS/SVM
%NoiseTraining (%)

Table 4. Results of the prediction by TFS/SVM
%NoiseTraining (%)

Then, the TFS/SVM models were employed for the prediction. The results are summarized in Table 4. These results show that the TFS/SVM works better in the prediction too. The total prediction rates for the data sets with 50% noise, 100% noise and 300% noise are 91.4%, 93.6% and 97.8%, respectively. The results for individual classes also are good and stable for all the classes.
It is concluded that the TFS/SVM works better in the training and it would be stable for the prediction even in the case with diverse sizes of samples for classes to be analyzed.

4 Conclusions

Classification and prediction for pharmacologically active classes of drugs under the presence of noise chemical compounds were investigated with TFS-based artificial neural network (TFS/ANN) and TFS-based support vector machine (TFS/SVM). The results suggest that the training by TFS/ANN considerably depends on the sample size in each class. Thus the prediction ability tends to be less for the activity class that has smaller number of samples than others. On the other hand, the TFS/SVM works better than TFS/ANN in both the training and the prediction. However, because many instances are required for such predictive identification of drug activities from the viewpoint of practical usage, extended study with a larger number of pharmacological activity classes should be done in future works.

This work was supported by Grant-In-Aid for Scientific Research on Priority Areas (B) 13131210. We also thank one of the reviewers for his careful checking of the description and valuable comments to revise the paper.


[ 1] M. A. Johnson, G. M. Maggiora, (Eds), Concepts and Applications of Molecular Similarity, Wiley, New York (1990).
[ 2] Y. Takahashi, Identification of Structural Similarity of Organic Molecules, Topics in Current Chemistry, 174, 105-133 (1995).
[ 3] M. Rarey, M. Stahl, Similarity Searching in Large Combinatorial Chemistry Spaces, J. Comput.-Aided Mol. Des., 15, 497-520 (2001).
[ 4] J. W. Raymond, P. Willett, Effectiveness of Graph-Based and Fingerprint-Based Similarity Measures for Virtual Screening of 2D Chemical Structure Databases, J. Comput. -Aided Mol. Des., 16, 59-71 (2002).
[ 5] D. Wilton, P. Willett, Comparison of Ranking Methods for Virtual Screening in Discovery Program, J. Chem. Inf. Comput. Sci., 43, 469-474 (2003).
[ 6] D. Flower , On the Properties of Bit String-Based measures of Chemical Similarity, J. Chem. Inf. Comput. Sci., 38, 379-386 (1998).
[ 7] Y. Takahashi, H. Ohoka, and Y. Ishiyama, Structural Similarity Analysis Based on Topological Fragment Spectra, Advances in Molecular Similarity, 2, ed. by R. Carbo & P. Mezey, JAI Press, Greenwich, CT (1998), pp. 93-104.
[ 8] Y. Takahashi, S. Fujishima, H. Kato, Chemical Data Mining Based on Structural Similarity, J. Comput. Chem. Jpn., 2, 119-126 (2003).
[ 9] Y. Takahashi, M. Konji, S. Fujishima, MolSpace: A Computer Desktop Tool for Visualization of Massive Molecular Data, J. Mol. Graph. Model., 21, 333-339 (2003).
[10] S. Fujishima, Y. Takahashi, Classification of Pharmacological Activity of Drugs using TFS-Based Artificial Neural Network, J. Chem. Inf. Comput. Sci., 44, 1006-1009 (2004).
[11] Y. Takahashi, K. Nishikoori, S. Fujishima, Classification of Pharmacological Activity of Drugs Using Support Vector Machine, Second International Workshop on Active Mining, 152-158 (2003).
[12] MDL Drug Data Report, MDL, Ver. 2001.1 (2001).
[13] H. Ando and Y. Takahashi, Artificial Neural Network Tool (NNQSAR) for Structure-Activity Studies, Proceedings of the 24th Symposium on Chemical Information Sciences, 117-118 (2000).
[14] S. W. Lee and A. Verri, Eds, Support Vector Machines 2002, LNCS 2388 (2002).
[15] J. C. Platt, Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, Microsoft Research Tech. Report MSR-TR-98-14, Microsoft Research, (1998).