Chemical Data Mining Based on Structural Similarity

Yoshimasa TAKAHASHI*, Satoshi FUJISHIMA and Hiroaki KATO

Dept. of Knowledge-based Information Engineering, Toyohashi University of Technology
1-1 Hibarigaoka, Tenpaku-cho, Toyohashi, Aichi 441-8580, Japan

(Received: April 17, 2003; Accepted for publication: June 30, 2003; Published on Web: August 29, 2003)

This paper describes an approach to chemical data mining based on the quantitative evaluation of structural similarity. The topological fragment spectrum (TFS) method reported by the authors was used for describing a chemical structure by numerical representation. The TFS is based on enumeration and numerical characterization of all possible substructures derived from the chemical structures. The TFS was applied to similar structure searching with over 3,600 drugs extracted from the World Drug Index. All the spectra were characterized for fragments having five or less bonds. Five different similarity (or dissimilarity) functions were investigated for their suitability for similarity searching with the TFS. Computational trial of similar structure searching on the database suggested that the present approach is successfully applicable to chemical and pharmaceutical data mining based on the evaluation of structural similarity of drug molecules.

Keywords: Data mining, Structural similarity, Drug design, TFS, Chemical database

Abstract in Japanese

Text in Japanese

PDF file(132kB)