Study on Raffenetti's P File Format in Conventional Ab Initio Self-Consistent-Field Molecular Orbital Calculations in Parallel Computational Environment

Hiroyuki TERAMAE and Kazushige OHTAWARA


Return

1 Introduction

A Hartree-Fock molecular orbital calculation (ab initio molecular orbital calculations) became very popular and has been establishing a new field of chemistry as a computer experiment, since it has been applied on the electronic structure calculations of various molecules for the last three decades. In the applications of the calculations on more realistic molecules, it has been an important problem to handle the two electron integrals, which increases very rapidly as the molecular size increases.
If each molecular orbital is expanded by the basis functions and expressed as,

The Hartree-Fock equations are expressed as,

where

Hrs is the core Hamiltonian and is a one electron integral.

is a two electron integral. The suffixes r, s, t, and u run from 1 to the total number of basis function N, and therefore the total number of the two electrons integral is of the order of N4.
The density matrix element is expressed as

and contains the variations cti to be resolved by the Hartree-Fock equation. The Hartree-Fock equations, therefore, should be solved iteratively (self consistent field, SCF). The two electron integrals are required several times to calculate the Fock matrix elements Frs.
In the classical programs such as Gaussian 70 used the two electron integral file with the packed four suffixes we utilize the file iteratively during the SCF calculations [1, 2]. From the symmetries of the suffixes, there are integrals of the same value. These are eliminated from the calculations and therefore there are six types of contributions to the Fock matrix element from a two electron integral as following:

Raffenetti has proposed a more efficient procedure to calculate the two-electron contribution to the Fock matrix, nowadays widely known as P super matrix algorithm [3]. Hereafter we call it P method, and another traditional four suffixes method as NOP method. The basis of the P method is to make a recombination of two electron integrals like,

The contribution to the Fock matrix element using P method becomes very simple.

If the number of two electron integrals does not change before and after the recombination and if the overhead for the recombination is small enough, the computational time will be much faster. The total number of multiply/add instructions would have been greatly reduced because the instructions decrease from six to two. The P method did work well, and both Hondo [2] and Gaussian [4] series of programs incorporate P method. Nowadays, the direct SCF method [5], which does not store the two electron integrals but calculates them repeatedly, is usually used. The P method, therefore, does not study well if the method works in any case.
On the other hand, due to the recent development of the microprocessor, it becomes possible to utilize the personal computer cluster to make the Hartree-Fock molecular orbital calculations with the parallel processing of the two electron integrals and the Fock matrix mentioned above. Because we now can use many CPUs and large size of memories that could not be supposed previously, it sometimes happens to break a previous common sense, that is, a paradigm shift. For example, in our previous work [7], we reported that the efficiency of the files system becomes good because the files for the two electron integrals are divided and stored in each local disk system when applying the parallel processing. The input/output (I/O) processing is applied on the divided files and it naturally becomes the parallel I/O. The operating system sometimes uses the memory buffer for I/O operation and in the extreme case all the two electron integrals are processed on memory. In this case, the processing time is very fast, because the I/O operation will be done just one time and the remaining read operation will be processing on memory. We can achieve the faster processing without re-writing programs. Within our molecular orbital calculations, therefore, the conventional SCF method that stores the integrals on files is faster than the direct SCF method. In the conventional SCF method, the treatment of the two electron integrals is very important as mentioned above, and it becomes important to study the P method which is really faster than the NOP method.
We are recently developing the molecular dynamics calculations based on the ab initio Hartree-Fock molecular orbital calculations, which requires the iterative calculations of 1000-3000 points. The total performance becomes large enough if the reduction of the single point calculation is so small. In the present article, therefore, we perform the moderated sized parallel processing of the ab initio Hartree-Fock calculations from 217 basis functions to 274 basis functions, and compare the CPU and wall clock times of P and NOP methods.

2 Method of Calculations

Table 1 shows our computational environment. The calculations are performed with the use of an 8 CPU/8 chassis PC cluster of Intel Pentium 4 CPU 2.4GHz with Intel 845 chipset. The network is 1000BaseT gigabit Ethernet. RedHat Linux version 8 is used for the operating system of the cluster system. The general molecular orbital program package GAMESS [5] is used throughout this study. The original code of GAMESS is used, because the code for two electron integrals was already written and suitable for parallel processing. The socket communication library within GAMESS is used.

Table 1. The configuration of PC cluster.
CPU8CPU Pentium4 2.4GHz, 512K Cache
ChipsetIntel 845
Memory1 GB DDR266/ board
Network1000 BaseT Ethernet
Hard Disk60GB / 5400rpm
Operating SystemLinux kernel 2.4/RedHat 8.0
Fortran CompilerGnu Fortran77
Parallel LibraryMPICH ver.1.2.4

The computational speed is measured with a series of minor-tranquilizers with the benzodiazepin and thienodiazepin frameworks; flutoprazepam (1, C19H16ClFN2O), triazolam (2, C17H12Cl2N4), clothiazepam (3, C16H15ClN2OS), etizolam (4, C17H16ClN4S), flutazolam (5, C19H18ClFN2O3), and lorazepam (6, C15H10Cl2N2O2) molecules (Scheme 1). The 3-21G basis set [8] is used throughout this study. We repeat the calculations ten times of single SCF and gradient of each molecule and take the fastest time among them.


Scheme 1.

3 Results and Discussion

Table 2 shows the molecular formula, number of atoms, number of basis functions, and the amount of two electron integral files of each molecule. When computing with single CPU, these are all severe calculations. In the case of flutazolam, which is the largest calculation among the present study, the amount of the file exceeds 2GB. In the parallel environment, however, all files are buffered on the main memory when 4-8 CPUs are used. We would like to note the following: even if the operating system cannot handle the files larger than 2GB, the calculation is still possible if you can divide files under the parallel environment.

Table 2. Molecular formula, number of atoms, number of basis functions, and the amount of the two electron integrals of each molecules.
MoleculeFlutoprazepamToriazolanClotiazepamEtizolamFultazolamLorazepam
formulaC19H16ClFN2OC17H12Cl2N4C16H15ClN2OSC17H16ClN4SC19H18ClFN2O3C15H10Cl2N2O2
atomsa403536384431
Basisb248239227245274217
TEIc1.5GB1.3GB1.0GB1.2GB2.9GB0.9GB
aNumber of atoms.bNumber of basis functions.cAmount of two electron integrals in giga byte unit.

Table 3 shows the CPU and wall clock time for the P and NOP methods, respectively. In all cases for N=1, it is clearly shown that the wall clock time by the NOP method is shorter and 0.35-0.65 times smaller than that of the P method. Furthermore, when comparing the sum of CPU and system time, the NOP method shows shorter time, except for the results of fultazolam that are almost the same.

Table 3. Lists of CPU, system and wall clock time of P and NOP methods. N is number of CPUs.
MoleculeNOPPNOP/P Ratio
NCPUSYSWallCPUSYSWallCPU+SYSWall
clotiazepam1124.9821.23259.14113.3451.18750.530.890.35
264.698.6394.5859.7126.06320.550.850.30
434.384.6250.5832.0410.64100.820.910.50
819.202.7232.5018.104.8433.740.960.96
etizolam1157.2928.89397.69142.2572.07994.160.870.40
281.4411.91119.4876.8236.00444.190.830.27
443.165.9862.5741.4016.23163.190.850.38
823.923.6338.6823.226.5942.180.920.92
flutazolam1265.3788.771078.06233.41116.051660.531.010.65
2145.9639.85451.09124.3256.85795.941.030.57
473.2215.66113.6466.6128.44340.350.940.33
840.358.9866.9637.3012.13113.361.000.59
flutoprazepam1179.7238.16482.96163.6382.201134.930.890.43
292.9512.83136.0987.7341.13521.620.820.26
449.536.9271.2346.1817.12197.990.890.36
827.654.1541.4826.307.1046.970.950.88
lorazepam1109.4018.92220.6696.3743.55564.100.920.39
256.418.3784.9650.6818.63198.830.930.43
431.594.6846.7026.627.9245.651.051.02
817.322.5628.9214.704.5729.171.030.99
triazolam1152.9629.02401.02138.8167.93963.880.880.42
278.9611.62114.6371.5933.13431.960.860.27
444.165.9863.4639.2215.03152.920.920.41
824.093.6837.0521.516.3839.881.000.93

Concerning the wall clock time, in all molecules, the difference becomes smaller when the number of CPUs increases. After all two electron integrals are buffered on the main memory, the wall clock time by the P method decreases more quickly than that by the NOP method showing the difference between the two methods. In the case of lorazepam, that is the smallest calculation of the present work, the difference between two methods disappears when 4 CPUs are used. In other molecules, the difference between the two methods also disappears when 8 CPUs are used again except for the fultazolam case. In the case of fultazolam, the NOP method result is still 0.59 times shorter than that of the P method even in the 8 CPU case, and the difference does not disappear in the present study. However, we consider from Table 3 that the difference between the two methods will vanish as the number of CPUs increases.

Table 4. The numbers of two electron integrals for NOP and P method.
MoleculeNOPPNOP/P Ratio
clotiazepam916968881764251450.52
etizolam1147623392270594800.51
flutazolam1922588303662324040.52
flutoprazepam1304694332558880620.51
lorazepam826576151546684920.53
triazolam1149378862198503890.52

The difference in the wall clock times between the two methods is brought by the difference of the amount of the files of the two electron integrals. We usually handle just the two electron integral that is larger than a certain threshold value (10-8 in the present study). Table 4 shows the number of two electron integrals by the P and NOP methods used in the present calculations. It is worthwhile to note that the number by the P method is almost two times larger than that by NOP method. From the definition of the P method, Irstu has a certain value if the integral <rs|ru> is smaller than the cutoff value but either 1/4<rt|su> or 1/4<ru|st> is larger than the cutoff value. As a result, the number of two electron integrals increases in the case of P method. In the calculation of the relatively larger molecule like the present calculations, almost all of the two electron integrals are under cutoff value and the effect of increasing the number of two electron integrals denoted above becomes significant. In Table 3, the system time of the P method is always larger than that of the NOP method, which indicates the overhead for the file I/O operation is larger in the case of P method. In a smaller molecule, this is not true because a large part of the two electron integrals have values larger than the cutoff threshold. The time required for the calculation of P method, is therefore, smaller than that of NOP method. Table 5 shows the results of C2F6 molecule case as an example of a small molecule. The number of two electron integrals by NOP method is 0.87 times when the 6-31G** basis set is used, and 0.76 times when the 3-21G basis set is used. In both basis sets, the calculations finish faster in P method. It should also be noted that the degree of acceleration is larger in the 6-31G** basis set case than in the 3-21G basis set case, which is easily seen from NOP/P factor of the number of two electron integrals.

Table 5. The difference of computational time and number of two electron integrals of C2F6 molecule. N is number of CPUs.
NOPPNOP/P ratio
Basis SetNCPUSYSWallCPUSYSWallCPU+SYSWall
6-31G*117.683.1526.3911.903.7619.901.331.33
29.121.7016.106.321.9811.201.301.44
44.830.989.083.461.018.681.301.05
82.860.666.662.020.655.321.321.25
Number of Integrals2086829923940759Ratio0.87
3-21G11.520.363.141.070.462.751.231.14
20.850.282.530.640.272.331.241.09
40.520.192.480.420.232.371.091.05
80.390.172.910.330.182.891.101.01
Number of Integrals2124399277210Ratio0.76

In the present paper, we have studied the CPU time and the wall clock time required for the ab initio Hartree-Fock molecular orbital calculations with and without the Raffenetti's P super matrix algorithm under the parallel environment using the PC cluster. As realistic examples, the six different drug molecules of the minor-tranquilizer and the 3-21G basis set are used. In almost all of the cases, the P method cannot calculate faster than the NOP method in such a large calculation. It should be concluded that the P method sometimes calculates faster but sometimes does not. In large scale of calculations, it should be suggested to perform a test calculation to confirm which method is faster prior to the real calculations.

We are grateful to Dr. Kazumasa Shinjo and Dr. Shinsuke Shimogawa of ATR Adaptive Communication Laboratories for stimulating discussions and suggestions. This work was partially supported by Telecommunication Advancement Organization of Japan (TAO).

References

[ 1] For example, see,
W. J, Hehre, L. Radom, P. v. R, Schleyer, and J. A. Pople, Ab Initio Molecular Orbital Theory, Wiley, New York (1986), and references cited therein..
[ 2] M. Dupuis, J. Rys, and H. F. King, J. Chem. Phys., 65, 111 (1976).
[ 3] R. C. Raffenetti, Chem. Phys. Lett., 20, 335 (1973), see also page 54 of reference [1].
[ 4] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, V. G. Zakrzewski, J. A. Montgomery, Jr., R. E. Stratmann, J. C. Burant, S. Dapprich, J. M. Millam, A. D. Daniels, K. N. Kudin, M. C. Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G. A. Petersson, P. Y. Ayala, Q. Cui, K. Morokuma, P. Salvador, J. J. Dannenberg, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. Cioslowski, J. V. Ortiz, A. G. Baboul, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. Gomperts, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, J. L. Andres, C. Gonzalez, M. Head-Gordon, E. S. Replogle, and J. A. Pople, Gaussian98, Gaussian, Inc., Pittsburgh PA (2001).
[ 5] J. Almlf, J. K. Faegri, and K. Korsell, J. Comput. Chem., 3, 385-399 (1982).
[ 6] M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. J. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, J. A. Montgomery, J. Comput. Chem., 14, 1347-1363 (1993).
[ 7] H. Teramae and K. Ohtawara, J. Chem. Software, 8, 55-61 (2002).
[ 8] J. S. Binkley, J. A. Pople, W. J. Hehre, J. Am. Chem. Soc., 102, 939-947 (1980).


Return