Development and Application of XyM2Mol System for ConvertingStructural Data by XyM Notation into Connection Tables

Kei ITO, Nobuya TANAKA and Shinsaku FUJITA


Return

1 Introduction

As a result of the advances of computers in the 1960s and the 1970s, databases of chemical structures were developed extensively, where the manipulation of structural formulas by computers emerged as a main objective. Many computer representations of structural formulas were exploited in order to extract useful pieces of information from such structural formulas that contained various types of information in the form of their substructures. Because the capacity of the computer was restricted during the early history of the computer, light-loaded representations such as linear notations have been developed, as summarized in reviews [1, 2] and books [3, 4].
The explosive development of hardware during the 1980s removed the capacity restriction of computers, so that more informative or heavy-loaded representation systems (topological systems) became permissible. Thus, connection tables [5] or equivalents [6] were adopted by most database and application systems for manipulating chemical structures [7, 8]. Thus, such linear notations as developed in the early history of chemoinformatics were entirely replaced by connection tables. This stage of chemoinformatics can be referred to as the era of storage and retrieval, because the trend of development was database-oriented.
The spread of the Internet system during the 1990s and the present decade has changed the situation around the computer representation of structures, where browser techniques based on HTML (HyperText Markup Language) or XML (Extensible Markup Language) play an important role [9]. This means that the preceding era of storage and retrieval has been shifted into the communication era. As a result, the problem "how to communicate chemical structural formulas" has again emerged as a main objective, because communication-oriented representations should be different from database-oriented ones in their prerequisites.


Figure 1. XyMTeX, XyMJava, and XyM2Mol based on XyM Notation.

Because documents for the Internet system are searched by text-based search engines, chemical structural formulas in such documents should be so represented as to match such text-based search engines. We have reported XyM Notation [10] as a linear notation and XyM Markup Language (XyMML) as a markup language [11, 12], which serve as such representations for the communication era. As shown in Figure 1, we have developed the XyMTeX system for typesetting chemical structural formulas [13 - 16], where codes by the XyM Notation are regarded as commands for XyMTeX [17]. We have developed the XyMJava system [12] for distributing chemical documents, where the XyM Notation has been implemented by the Java language so as to display chemical structural formulas on a CRT screen and to communicate them via the Internet. However, the XyM Notation does not contain explicit x-, y-, and z-coordinates and concrete data on bond connection so that the XyM Notation at the present status is restricted to be a communication-oriented representation for printing, displaying and communicating chemical structures.
On the other hand, many applications of chemical use have been developed on the basis of connection tables, where the connection tables have been used as input devices and as database search devices, because they contain x-, y-, and z-coordinates and data on bond connection in an explicit manner. In order to cover communication-oriented and database-oriented fields, communication-oriented representations should be capable of serving as input devices for computer systems handling chemical structures. If the XyM Notation as a communication-oriented representation becomes capable of producing connection tables (the right branch of Figure 1), it will be utilized in the chemical applications that have been commercialized.
As clarified in the preceding paragraphs, the aim of the present paper is the development of a conversion tool of the XyM Notation into connection tables, which serve as input devices for various chemical applications. Thereby, the XyM Notation that is effective in the communication era also shows wide scope and applicabilities even to database-oriented systems and other chemical applications.

2 XyM Notation

2. 1 General and Specific Skeletons

The XyM Notation has the following general formats:
\GenSkel(SKBONDLIST)[BONDLIST]
   {ATOMLIST}{SUBSLIST}[OMITLIST]
\SpecSkel(SKBONDLIST)[BONDLIST]
   {SUBSLIST}[OMITLIST]
\FuseSkel(SKBONDLIST)[BONDLIST]
   {ATOMLIST}{SUBSLIST}{FUSE}[OMITLIST]
which are based on the methodology that each organic compound is regarded as a derivative of a mother skeleton.
\GenSkel, \SpecSkel, and \FuseSkel
There are three types of skeletons in the palette of drawing by XyM Notation. The symbol \GenSkel represents a general mother skeleton, which takes at most five arguments with parentheses, brackets or braces.The name of each skeleton (\sixheterov, \sixheterovi, etc.) is initiated by a backslash symbol (or a yen symbol in the Japanese character system) to represent the stem of a XyM Notation, which shows the ring size (sixhetero etc.) and the direction of drawing (v, vi, vb, vt, h, and hi). The symbol \SpecSkel represents a specific mother skeleton, which has specified atoms on the skeleton so that the ATOMLIST is not involved. The symbol \FuseSkel represents a fused skeleton to generate a fused ring system.
SKBONDLIST
The optional argument SKBONDLIST shows the stereochemistry of skeletal bonds (a or b).
BONDLIST
The optional argument BONDLIST shows the unsaturation of skeletal bonds if present. The argument BONDLIST contains lowercase alphabets to designate such unsaturation, where an alphabet is sequentially assigned to each bond of the skeleton.
ATOMLIST
The argument ATOMLIST shows atoms at vertices of the skeleton if present.
SUBSLIST
The argument SUBSLIST shows substituents at vertices of the skeleton if present.
OMITLIST
The optional argument OMITLIST shows the deletion of skeletal bonds for converting a ring structure into an open-chain structure.
FUSE 
The argument FUSE shows a bond to be fused to a mother skeleton.
For example, 4-vinylpyridine can be typeset by the following XyM-Notation codes,
\sixheterov[bdf]{1==N}{4==CH=CH$_{2}$} 
\pyridinev{4==CH=CH$_{2}$} 
where the first is an example of \GenSkel and the second is an example of SpecSkelt. Thereby, we obtain the same structural formula as follows:


Scheme 1.

2. 2 Nested Substitution

General and specific skeletons can be converted into the corresponding substituents by designating a symbol (yl) with a locant number (called "yl function") in their SUBSLIST arguments. For example, the code \sixheteroh{1==N;4==O}{1==(yl)} represents a morphorino substituent. Thereby, the XyM Notation:
\pyridineh{4==%
\sixheteroh{1==N;4==O}{1==(yl)}} 
represents 4-morphorinopyridine as follows:


Scheme 2.

2. 3 Spiro Compounds

Substituents generated by a yl function can be used to draw spiro compounds. For example, the morphorino substituent described above is described in an ATOMLIST as follows:
\sixheteroh{1==HN;4h==%
\sixheteroh{1==N;4==O}{1==(yl)}}{} 
so that we can draw the following spiro compound:


Scheme 3.

2. 4 Fused Compounds

By using the \FuseSkel described above, we can generate various units for ring fusion. For example, the code \fivefusev[b]{1==O}{}{D} is designated in the BONDLIST argument as follows:
\fiveheterov[bd%
{b\fivefusev[b]{1==O}{}{D}}]{1==S}{}
Thereby, we can draw the following fused heterocyclic compound:


Scheme 4.

where the furane fusing unit is fused at its 'D' bond on the 'b' bond of the thiophene counterpart.

3 XyM2Mol System

The XyM2Mol system for the format conversion of chemical structural data has been developed by using the Java language. The XyM2Mol system consists of two independent tools: the XyM2Mol application and the XyM2Mol applet. These tools are commonly based on the XyMJava system [19, 20] for the analysis of XyM-Notation codes, but they are different in procedures of input and output.

3. 1 Model-View-Controller for XyM2Mol

To realize an effective user interface for the XyM2Mol system, we have adopted the model-view-controller (MVC) paradigm [22, 23] and design patterns for object-oriented programming [24, 25].

Table 1. Model-View-Controller (MVC) for XyM2Mol System
MVCClassFunction
ModelChemStructRepresenting chemical structures
AtomRepresenting atoms
BondRepresenting bonds
ViewXyM2MolOutputting connection tables
XyM2MolAppletDisplaying chemical structures and connection tables via a WWW browser
ControllerXyMJavaMain Program for XyMJava
ChemStructCreator (XyMStructCreator)Extraction of chemical structures from XyM Notations
XyMParser (XyMNotnParser)Parsing XyM Notations Auxiliary for XyMParser

The Model and the Controller of the MVC used in the XyM2Mol system are based on those used in the XyMJava system [19]. The Views of the MVC (XyM2Mol and XyM2MolApplet) are newly developed to manipulate connection tables (Table 1).


Figure 2. Class diagram for chemical model.

According to the MVC (Table 1), the XyM Notation is parsed by XyMParser, which is the Controller part of the XyMJava system reported previously [19]. The result of the XyMParser is first converted into a transitive format named "chemical model" (the Model part of the XyMJava system), the class diagram of which is shown in Figure 2. The class diagram is depicted in terms of the unified modelling language (UML) [25], where a line with a rhombus represents an aggregation. For example, the Bond class multiply refers to the Atom class, while both of them are multiply referred to by the ChemStruct class.
The chemical model, which has been developed for the XyMJava system, is a common handling technique of structural data that is applicable to a wide variety of software. Thus, the Atom class (Figure 2) encapsulates necessary and sufficient pieces of information on an atom or a functional group, e.g. x- and y-coordinates (and sometimes z-coordinate), an element symbol (String ElementType), a functional group (String AtomGroup), etc. The Bond class (Figure 2) encapsulates necessary and sufficient pieces of information on a bond, e.g. a bond multiplicity (int bondType), terminal atoms (Atom beginAtom and Atom endAtom), etc. The ChemStruct class (Figure 2) has no concrete information on atoms and bonds but contains a set of methods for referring to Atom objects (e.g. void addAtom(:Atom) for setting 'Atoms.addElement' to a Vector instance Atoms, etc.).
The flexible nature of the chemical model (Figure 2) in the MVC of the XyMJava system permits us to develop additional chemical applications by constructing new View parts. Thus, the XyM2Mol system is regarded as a chemical application based on such a newly-developed View that acts as a filter program for converting XyM-Notation codes into connection tables (cf. Table 1). The XyM2Mol system will be published in the future in our homepage:
http://imt.chem.kit.ac.jp/fujita/fujitas/fujita.html

3. 2 XyM2Mol Application

The XyM2Mol application of the XyM2Mol system is designed to serve as a Java application. The class diagram of the XyM2Mol application is shown in Figure 3.
The class XyM2Mol (XyM2Mol.class) uses the XyMJava class in order to convert structural data by XyM Notation into connection tables. The output format of such connection tables is designed to be compatible with the so-called "molfile" format [21], because this format is widely used in commercialized chemical applications.
To run the XyM2Mol application (Figure 3), the command:
c:\>java xym2mol <input> <output>
is inputted in a command line. Thereby, a XyM-Notation code is read from a file named <input> and the resulting connection table is written into a file named <output>.


Figure 3. Class diagram of the XyM2Mol application as a Java application.

For example, an input file (e.g., xyminput.xnt as <input>) is prepared to contain the following code by XyM Notation:

\bzdrv{1==F;2==Cl;4==OH}
which corresponds to 3-chloro-4-fluorophenol represented by the following formula:


Scheme 5.

The locant numbering of XyM Notation is fixed for each skeleton so that the locant numbering of the IUPAC name is different from that of the above XyM-Notation code.
The input file is processed by means of the XyM2Mol application by inputting

c:\>java xym2mol xyminput.xnt 
     xymoutput.xymo
in a command line. Thereby, the connection table (Figure 4), which is of "molfile type", is generated and written down into an output file (e.g. xymoutput.xymo as <output>).


Figure 4. Connection table of molfile type, which is generated by the XyM2Mol application.

The top row of the connection table (Figure 4) gives brief pieces of information, where the first value 9 shows the number of atoms and the second value 9 shows the number of bonds in the structure of 3-chloro-4-fluorophenol. Each row in the middle part is concerned with the attributes of a node (vertex), which are x-, y-, and z-coordinates, a symbol representing an atom or a group, and other subsidiary data. The locant numbers shown in the structural formula correspond to such ones that are implicitly indicated by the sequence of atom (or group) appearance in the top part. The bottom part of the connection table (Figure 4) shows data of bonds. For example, the alignment 1 2 1 ... represents atom 1 (C) and atom 2 (C) that are linked with a single bond.

3. 3 XyM2Mol Applet

The XyM2Mol applet of the XyM2Mol system is designed to serve as a Java applet. The class diagram of the XyM2Mol applet is shown in Figure 5.
The class XyM2MolApplet (XyM2MolApplet.class) also uses the XyMJava class via the methods action and setXyM2Mol in order to convert a XyM-Notation code into a connection table of "molfile" format. The resulting connection table is outputted into a dialog box appearing in an Internet browser. In addition, the corresponding structural formula is displayed by means of the method paint.
Since the XyM2Mol applet works as a Plug-in for an Internet browser, XyM-Notation codes are embedded in an HTML document, where an <APPLET> tag and a <PARAM> tag are described for each structural formula to be processed by the XyM2Mol applet.


Figure 5. Class diagram of the XyM2Mol applet as a Java applet.

For example, an HTML file in which the XyM-Notation code of 3-chloro-4-fluorophenol is embedded is prepared as follows:

<HTML>
<HEAD>
<TITLE>test fujita</TITLE>
</HEAD>
<BODY>

<P>
<APPLET code="XyM2MolApplet.class" 
name="XyMJava" 
WIDTH="600" HEIGHT="600">
<param name="xymnot" 
value="\bzdrv{1==F;2==Cl;4==OH}">
<param name="input" value="Yes">
</APPLET>
</BODY>
<HTML>
where the <APPLET> tag designates newly-developed XyM2MolApplet.class as a Plug-in and the parameter named "xymnot" has the value of \bzdrv{1==F;2==Cl;4==OH} as a XyM-Notation code. Another parameter tag named "input" is a switch for whether dialog boxes are displayed or not.
By reading the HTML file by a WWW browser (e.g., Internet Explorer), we are able to see a display image created by the XyM2Mol applet, as shown in Figure 6.
The display image (Figure 6) consists of two dialog boxes and a structural formula. The top dialog box initially contains a XyM-Notation code from the HTML file. The initial code can be changed to redraw the initial structural formula. A revised formula corresponding to the changed XyM-Notation code appears by clicking the "redraw" button attached to the right part of the dialog box. The second dialog box, which is initially vacant, is used to output a connection table by clicking the "reinput" button attached to the right part of the dialog box. Although Figure 6 does not contain the lower part of the connection table, the full connection table can be browsed by scrolling.


Figure 6. Display image created by the XyM2Mol applet

4 Chemical Applications

Because connection tables of "molfile" type are used in many chemical applications manipulating chemical structures, the XyM2Mol system provides us with versatile methods for applying structural data by XyM Notation to such chemical applications, as shown in Figure 7. For example, structural data by XyM Notation are converted into connection tables, which are applied to ChemDraw for drawing 2D structural formulas [26], Chem3D for drawing 3D structural formulas [26], and MOPAC for semi-empirical quantum chemical calculation [27]. If we work within a free-software environment, we can use ISIS Draw for drawing 2D structural formulas [28], MOLDA for drawing 3D structural formulas [29], and TINKER for optimizing 3D structures [30]. We are also able to use POV-Ray for drawing molecular graphics of high quality [31].


Figure 7. XyM Notation to chemical applications.

Let us illustrate the procedure shown in Figure 7 by taking folic acid as an example. In order to make an optimization step easier, we write the following XyM-Notation code of folic acid:

value="\tetrahedral{0==C;
2==\tetrahedral{4==(yl);0==O;
2==\tetrahedral{4==(yl);0==H}};3D==O;
1==\tetrahedral{3==(yl);0==C;2==H;4==H;
1==\tetrahedral{3==(yl);0==C;2==H;4==H;
1==\tetrahedral{3==(yl);0==C;1==H;
2==\tetrahedral{4==(yl);0==C;1D==O;
2==\tetrahedral{4==(yl);0==O;
2==\tetrahedral{4==(yl);0==H}}};
4==\tetrahedral{2==(yl);0==N;1==H;
4==\tetrahedral{2==(yl);0==C;1D==O;
4==\sixheteroh[ace]{}{1==(yl);2==H;3==H;
5==H;6==H;
4==\tetrahedral{2==(yl);0==N;1==H;
4==\tetrahedral{2==(yl);0==C;1==H;3==H;
4==\decaheterov[acfhk]
{8==N;5==N;4==N;2==N}{7==(yl);6==H;
1==\tetrahedral{3==(yl);0==O;4==H};
3==\tetrahedral{2==(yl);0==N;3==H;4==H}
}}}}}}}}}}"<!-- folic acid -->
in which hydrogen atoms are explicitly drawn by using the skeleton \tetrahedral. This code generates the following structure by using the XyMTeX system [17]:


Scheme 6.

The same code is processed by means of the XyM2Mol system so as to give the corresponding connection table of "molfile" format. The connection table is read by ChemDraw to draw a 2D structural formula shown in Figure 8.


Figure 8. Formula of folic acid generated by XyM2Mol and ChemDraw

The 2D structure shown in Figure 8 is optimized by molecular mechanics to give an optimized conformation shown in Figure 9, which is viewed by using Chem3D. On the other hand, the structure is viewed by using POV-Ray, as shown in Figure 10.


Figure 9. 3D-Diagram generated by combining XyM2Mol and Chem3D


Figure 10. 3D-Diagram generated by combining XyM2Mol, Chem3D, and POV-Ray

5 Conclusion

The XyM2Mol system has been developed to convert XyM-Notation codes into connection tables. The XyM2Mol application of the XyM2Mol system is designed as a Java application, which supports text-based conversion using input and output files. The XyM2Mol applet of the XyM2Mol system is designed as a Java applet, which supports the conversion of XyM-Notation codes embedded in an HTML-document. Thereby, the versatility of the "chemical model" developed on the basis of the model-view-controller (MVC) paradigm is demonstrated to cover database-oriented fields as well as communication-oriented fields of chemoinformatics. In particular, the structural data by XyM Notation become applicable to a wide variety of chemical applications through such connection tables.

This work was supported in part (S. F.) by the Japan Society for the Promotion of Science: Grant-in-Aid for Scientific Research B(2) (No. 14380178, 2002-2004).

References

[ 1] S. Hanai, Computer Chemistry, ed. by S. Ono, Maruzen, Tokyo (1988), pp 57-98.
[ 2] W. J. Wiswesser, J. Chem. Inf. Comput. Sci., 25, 258-263 (1985).
[ 3] C. H. Davis and J. E. Rush, Information Retrieval and Documentation in Chemistry, Greenwood, Westport (1974).
[ 4] J. E. Ash, P. A. Chubb, S. E. Ward, S. M. Welford, and P. Willett, Communication, Storage and Retrieval of Chemical Information, Ellis Horwood, Chichester (1985).
[ 5] D. J. Gluck, J. Chem. Doc., 5, 43-51 (1962).
[ 6] L. Spialter, J. Am. Chem. Soc., 85, 2012-2013 (1963).
[ 7] A. Dalby, J. G. Nourse, W. D. Hounshell, A. K. I. Gushurt, D. L. Greier, B. A. Leland, and J. Laufer, J. Chem. Inf. Comput. Sci., 32, 244-255 (1992).
[ 8] S. V. Kasparek, Computer Graphics and Chemical Structures, John Wiley & Sons, New York (1990).
[ 9] S. M. Bachrach, ed., The Internet: A Guide for Chemists, American Chemical Society, Washington, D. C. (1996).
[10] S. Fujita and N. Tanaka, J. Chem. Inf. Comput. Sci., 39, 903-914 (1999).
[11] S. Fujita, J. Chem. Inf. Comput. Sci., 39, 915-927 (1999).
[12] The word 'XyM' is an uppercase form of the stem cum of the Greek word, which is a root for the word 'chemistry' via an Arabian word 'alchemy'.
[13] S. Fujita, (1993), On-line Manual for XyMTeX Version 1.00. Available from CTAN (tex-archive/macros/latex209/contrib/xymtex/).
[14] S. Fujita, Comput. Chem., 18, 109-116 (1994).
[15] S. Fujita, TUGboats, 16, 80-88 (1995).
[16] S. Fujita (1996), On-line Manual for XyMTeX Version 1.01. Available from the homepage (http://imt.chem.kit.ac.jp/fujita/fujitas/fujita.html).
[17] S. Fujita, XyMTeX. Typesetting Chemical Structural Formulas, Tokyo, Addison-Wesley Japan (1997).
[18] S. Fujita and N. Tanaka, TUGboat, 21(1), 7-14 (2000).
[19] N. Tanaka and S. Fujita, J. Computer Aided Chem., 3, 37-47 (2002).
[20] N. Tanaka, T. Ishimaru, and S. Fujita, J. Computer Aided Chem., 3, 81-89 (2002).
[21] MDL Information Systems, Inc., http://www.mdl. com/downloads/public/ctfile/ctfile.pdf
[22] G. E. Krasner and S. T. Pope, J. Object-Oriented Programming, 1(3), 26-49 (1988).
[23] B. Frank, Pattern-Oriented Software Architecture: A System of Patterns, John Wiley & Sons, New York (1996).
[24] E. Gamma, R. Helm. R. Johnson, J. Vlissides, Design Pattern Elements of Reusable Object-Oriented Software, Addison Wesley Longman, New York (1995).
[25] M. Grand, Patterns in Java, A Catalog of Reusable Design Patterns, Illustrated with UML, Volume 1, John Wiley & Sons, New York (1998).
[26] CambridgeSoft Corporation, http://www.cambridgesoft.com/.
[27] J. J. P. Stewart, Int. J. Quant. Chem., 58, 133-146 (1996).
See also
http://www.cachesoftware.com/ mopac/index.shtml.
[28] MDL Information Systems, Inc., http://www.mdli.com/.
[29] H. Yoshida and H. Matsuura, J. Chem. Software, 4, 81-88 (1998).
See also
http://www.molda.org/.
[30] W. Ponder (2004), On-line Manual for TINKER Version 4.2. Available from http://dasher.wustl.edu/ tinker/guide.html.
[31] Persistence of Vision Raytracer Pty. Ltd., http://www.povray.org/.


Return