(Received: August 27, 2004; Accepted for publication: June 20, 2005; Published on Web: August 31, 2005)
A three-layered neural network model to predict the hazards of a variety of compounds based on a quantitative structure-activity relationship was developed. The inputs were 10 principal components from 37 kinds of molecular descriptors calculated with MO programs. For the output the data used in the Predictive Toxicology Challenge (PTC) 2000-2001 contest were employed, containing 454 compounds with the carcinogenic activity of male rats. The total database of 454 compounds was split into training (144 compounds), validation (143) and test (167) sets. To solve the problems such as over-training, over-fitting and local minimum in training the neural network with the error-back-propagation algorithm, various conditions of the network such as the training cycles and neuron numbers of the intermediate layer were optimized. The optimum model showed a correct classification rate close to 74 %, higher than any of the PTC contestants.
Keywords: Quantitative structure-activity relationship, Neural network, Carcinogenicity prediction, Principal component analysis, Over-training