Development of Neural Network for Incomplete Data Set, CQSAR:
Compensation Quantitative Structure-Activity RelationshipsDevelopment of Neural Network for Incomplete Data Set, CQSAR: Compensation Quantitative Structure-Activity Relationships

Tomoo AOYAMAa*, Junko KAMBEb, Umpei NAGASHIMAc and Hidenori UMENOd

aFaculty of Engineering, University of Miyazaki
1-1 Gakuen Kihanadai Nishi, Miyazaki 889-2192 Japan
bFaculty of Media Communication, Edogawa University
474 Komaki, Nagareyama, Chiba 270-0198, Japan
cResearch Institute of Computational Science, National Institute of Advanced Industrial Science and Technology
1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan
dFaculty of Engineering, Kumamoto University
2-39-1 Kurokami, Kumamoto 860-8555, Japan

(Received: May 28, 2007; Accepted for publication: June 28, 2007; Published on Web: September 27, 2007)

Multi-layer neural networks are used for the multi regression analysis of many kinds of phenomena whose expressions are unknown. The application fields are medicine designs and environmental problems. We often find incomplete parts in descriptors, which make the precision of the analysis lower. Moreover, the incomplete parts make the linked parts of other descriptors invalid. We often cannot calculate the multi regression analysis, therefore, we wish to eliminate the erroneous effects. In the paper, we discuss some approaches to eliminate such effects, and derive a method based on neural networks, which compensates defect descriptors. We call the method the compensation quantitative structure-activity relationships method (CQSAR).
The first step of CQSAR is to interpolate defective parts of descriptors. There are various interpolation methods. The selection depends on the number of data and characters of the data set. We introduce 3 kinds of methods; these are, variable absolute-function fitting, parametric observed-vector method, and interpolation by using a 3-layer neural network and an arithmetic progression. The first 2 methods are for small data set. The neural network approach is for general purpose, which requires over 7 data. If we can get over 11 data, the approach also gives partial derivative coefficients also. We confirm the effects in numerical calculations.
The second step of CQSAR is to calculate the multi regression analysis based on the non-linear fitting functions of neural networks. We evaluate propagations of error caused by interpolations in the first step, and show that the error does not increase. In the evaluation, we give a new reconstruction learning, and show the effectiveness of the CQSAR method under a defect ratio of 50%.

Keywords: Incomplete data set, Multi-layer Neural Network, Derivative of neural networks, QSAR

Abstract in Japanese

Text in Japanese

PDF file(699kB)