
Figure 1. Schematic map of observation points along the Tamagawa, Tokyo, Japan.
Table 1. Chemical indexes used in water analysis of the Tamagawa
| Chemical index (abbreviations) | Explanation |
|---|---|
| Activity of hydrogen ions (pH) | Water acidity or alkalinity. pH is high in limestone areas and/or areas close to the sea. |
| Dissolved Oxygen (DO) | Amount of dissolved oxygen that is freely available in water to sustain fish and other aquatic organisms. Lower DO level indicates higher level of water pollution. |
| Biochemical Oxygen Demand (BOD) | Total amount of oxygen consumed in the biological processes that break down organic matter in water. Higher BOD level indicates higher level of water pollution. |
| Chemical Oxygen Demand (COD) | Mass concentration of oxygen consumed by the chemical breakdown of organic and inorganic matter. Higher COD level indicates higher level of water pollution. |
| Total Nitrogen (T-N) | Total amount of nitrogen compounds contained in the water. Can be divided into inorganic and organic matter groups, as well as into dissolved matter and particulate matter groups. |
| Total Phosphorus (T-P) | Total amount of phosphorus compounds contained in the water. Can be divided into inorganic and organic matter groups, as well as into dissolved matter and particulate matter groups. |
| Chloride ion condensation (Cl-) | Total amount of chloride ions. Index of pollution caused by human activities. |
| Ammonium Nitrogen (NH4-N) | Total amount of ammonium ions mainly from pollution from human and animal waste. High NH4-N levels increase nitrification, and lower the DO level. |
| Nitrite Nitrogen (NO2-N) | Total amount of nitrite ion mainly from pollution from agricultural fertilizers. Chemicals are formed in the decomposition of waste materials, such as manure or sewage. |
| Nitrate Nitrogen(NO3-N) | Total amount of nitrate ions mainly from pollution from agricultural fertilizers. Chemicals are formed in the decomposition of waste materials, such as manure or sewage. |
| Phosphate Phosphorus (PO4-P) | Total amount of nitrate ions mainly from pollution from agricultural fertilizers, detergents, etc. |
| Conductivity (COND) | A measure of the ability of water to carry an electrical current. An increase in ion concentration causes an increase in COND. |
Due to this missing data, analysis could not be done using conventional methods such as PCA. Therefore, in this study, the missing data were estimated using a perceptron type neural network and arithmetic progression [6]. Due to differences in scale of data, data used were then standardized for comparison. All values had compatible units from a distribution with a mean of 0 and a standard deviation of 1.
Principal component analysis (PCA) was then applied to the combined data set to determine parameters that can be used as indexes to efficiently describe the pollution level of the Tamagawa. The aim of PCA is to find and interpret hidden complex and causally determined relationships within a data set.
Cluster analysis (CA) was then used to confirm the reliability of parameters obtained using PCA. CA is a classification method that is used to group individuals or variables. CA results were presented as dendrograms obtained using normalized Euclidean distances and the Ward's method.
All calculations in this study were performed by applying SPSS 11.0 software running on a Windows XP platform.
Table 2. Eigenvalues and contribution (%) of principal components.
| Component | Eigenvalue | Contribution | Cumulative contribution |
|---|---|---|---|
| 1 | 7.37 | 61.39 | 61.39 |
| 2 | 2.11 | 17.55 | 78.93 |
| 3 | 0.92 | 7.66 | 86.59 |
| 4 | 0.57 | 4.79 | 91.38 |
| 5 | 0.40 | 3.33 | 94.71 |
| 6 | 0.22 | 1.81 | 96.52 |
| 7 | 0.19 | 1.57 | 98.09 |
| 8 | 0.11 | 0.89 | 98.98 |
| 9 | 0.05 | 0.42 | 99.40 |
| 10 | 0.04 | 0.32 | 99.72 |
| 11 | 0.02 | 0.19 | 99.91 |
| 12 | 0.01 | 0.09 | 100.00 |
Table 3 lists the coefficients of the indexes in the first and second principal components. In the first principal component, signs of both pH and DO are negative, and absolute values of Cl- and COND are low, less than 0.3. Therefore, except Cl- and COND, all indexes are important for the first principal component. The first principal component was complicated and consisted of a linear combination of ten chemical indexes. In the second principal component, Cl- and COND are high and significant. This suggests that the parameter of combination of indexes such as the first principal component is effective to analyze the degree of water contamination in the Tamagawa.
Table 3. Coefficients in the first and second principal components
| Index | First component | Second component |
|---|---|---|
| pH | -0.488 | 0.161 |
| DO | -0.856 | -0.167 |
| BOD | 0.787 | 0.093 |
| COD | 0.973 | 0.031 |
| T-N | 0.955 | -0.027 |
| T-P | 0.952 | -0.199 |
| Cl- | 0.211 | 0.952 |
| NH4-N | 0.705 | 0.247 |
| NO2-N | 0.893 | -0.106 |
| NO3-N | 0.875 | -0.227 |
| PO4-P | 0.924 | -0.273 |
| COND | 0.229 | 0.948 |
Figure 2 shows a K-L plot of the principle component scores of 153 data points, where the X- and Y-axes are the first and second principal components, respectively. The data points near the point (-1.5, 0.0) correspond to the upper stream of the Tamagawa, those in the region (X>0, Y<0) correspond to the middle stream, and those in the widespread region (X>0, Y>0) correspond to the lower stream. The first principal component classifies the data into two groups on the K-L plot: the upper stream and the others, namely combined middle/lower stream. This indicates that the first principal component can be used to classify the upper stream and the combined middle/lower stream, but can not distinguish the middle and lower streams.

Figure 2. K-L plot of principal component scores. X- and Y-axes are the first and second principal components, respectively.
The observation points corresponding to the lower stream have high absolute values of the second principal component, whereas those of the upper and middle streams have either low or negative absolute values. In contrast, the second principal component can be used to roughly classify the data into the combined upper/middle stream and the lower stream. Combination of the scores for the first and second principal components therefore enables classification of the data into the upper, middle, and lower streams of the Tamagawa.
Figure 3 shows the change in the average score of the first principal component for the entire river (all three streams combined), the upper stream, middle stream, and lower stream during FY 1994-2002. A smaller score for this component indicates higher water quality.

Figure 3. Change in average score of the first principal component for the entire stream (all three streams), upper stream, middle stream, and lower stream from 1994 to 2002.
The equation for the regression line for the score was y = 0.0107x - 1.3078 (R2 = 0.2206) for the upper stream, y = -0.1097x - 1.2403 (R2 = 0.7214) for the middle stream, and y = -0.1022x - 1.1616 (R2 = 0.4227) for the lower stream, where R is the correlation coefficient. The slope of the regression line of the upper stream is almost zero, indicating that the water quality of the upper stream of the Tamagawa has remained relatively clean and constant from 1994 to 2002. The slope of the regression line of the middle stream is almost the same and that of the lower stream is negative, suggesting an increase in water quality during this observation period. In addition, contamination of all three streams increased significantly in 1995 and slightly in 1998. The slopes of the regression lines are the same for the middle and lower streams, indicating that improvement in water quality in these two streams significantly affects the water quality of the entire river.
Figure 4 shows the change in the score of the first principal component for three groups of fiscal years (1994-1996, 1997-1999, and 2000-2002) and the average at each observation point. Again, a smaller score indicates higher water quality.

Figure 4. Change in the score of the first principal component for the entire observation period (1994-2002) and for three groups of FY (1994-1996, 1997-1999, and 2000-2002) at every observation point.
In the upper stream (No.1-6), the scores are negative. In contrast, the scores for the middle and lower streams (No.7-17) are positive, indicating a significant increase in contamination of these two streams, corresponding to the results in Figure 3.
The score for first principal component changes drastically from negative at No.6 to positive at No.7, suggesting a strong influx of pollutants between these two locations in the river. To determine the cause of this contamination, we investigated relationships between the water quality of the main stream and the tributaries (the Yachigawa and the Zanborigawa; Figure 1) that flow into this main stream between these two observation points. The water quality of the Yachigawa River is more contaminated than the main stream at No. 6 and 7 in terms of BOD, COD, NH4-N, NO2-N, and NO3-N, whereas the water quality of the Zanborigawa River is more contaminated in terms of T-P, PO4-P. All these contaminants strongly affected the water quality at No.7.
In addition, the change in water quality between No.9 and No.10 was also significant. However, we were not able to determine the source of pollutants. The score of the first principal component sharply decreased between No.10 and No.12, indicating a mechanism of water purification. A weir and some holms exist between No.10 and 11, and between No.11 and 12, possibly purifying the water via back water. The purification mechanisms of weirs and holms remain unclear. As evident in Figure 3 as well, improvement in water quality in the middle and lower streams significantly affected the water quality for the entire river.

Figure 5. Dendrogram of the location pattern for the entire observation period from 1994 to 2002.
Based on the dendrogram, the observation points can be roughly assigned to three groups: the upper stream, which is clean; the lower stream, which is highly polluted; and the middle stream, which is intermediate in pollution level between the upper and lower streams. This grouping corresponds well with that in the K-L plot from the PCA (Figure 2).
Figure 6 shows a dendrogram of the 12 indexes based on the CA analysis. According to this dendrogram, all 12 indexes can generally be grouped into three main clusters. First, pH and DO were separated from the other indexes, and correspond well with the negative coefficients of pH and DO in the first principal component. Second, Cl- and COND were separated from the other indexes. These two indexes had high values in the lower stream due to the influence of seawater and/or severe pollution. Such high values correspond to the high coefficients of Cl- and COND in the second principal component. The remaining indexes are included in the other cluster. The CA results correspond well to the PCA results.

Figure 6. Dendrogram of the 12 chemical indexes
We thank Professor H. Chuman of Tokushima University for numerous stimulating discussions.