Estimation of Spatial Air Pollutant
Concentration Fields from Observation Data
Stefan Falke
Advisor: Rudolf Husar
A thesis proposal presented in partial
fulfillment of the requirements of the degree of
Doctor of Science in Environmental Engineering
Washington University
March 13, 1997
Reliable air pollutant concentration fields are essential for environmental researchers, epidemiologists, and policy-makers in such activities as air quality pattern and trend analysis, exposure assessment, and monitor network design. The goal of this research is to develop new methodologies for estimating spatial air pollution concentration fields from observation data and to provide uncertainty measures associated with the estimated concentration fields. The proposed interpolation methodologies will combine physically based and statistically based methods to estimate concentrations at unknown locations. Pollutant transport will be incorporated using wind speed and wind direction data. Topographical data will be employed to account for the elevation dependence of pollutant concentrations. Surrogates will be utilized to increase the spatial resolution of the data. For instance, visibility observations are surrogates for fine mass concentrations because of the strong correlation between fine mass concentrations and visibility degradation. Statistical methods, such as inverse distance weighted interpolation and kriging, will be employed to aid in relating concentrations at monitored locations to non-monitored locations. The uncertainty of the interpolation will be assessed using cross validation and covariance analysis techniques. Existing techniques account for uncertainties associated with the spatial configuration of the observed data as well as any redundant information they contain. These techniques will be extended to assess the influence of incorporating emission, wind, topographical, and surrogate data. The new interpolation scheme will be applied to generate tropospheric ozone and particulate matter concentration fields for the coterminous U.S.
*. Specific Aims and Significance *
1.1 Statement of Problem
*1.2 Objectives
*1.3 Significance
*2. Background
*2.1 Estimation Methods
*2.1.1 Statistical
*2.1.2 Surrogate Aided
*2.1.3 Physically Based
*2.2 Uncertainty Measures
*3. Approach to the Thesis
*3.1 Review of Ozone and Particulate Matter Spatial Characteristics
*3.2 Review of Ozone and PM Source Receptor Relationship
*3.3 Collection and Interpretation of Ozone and PM Datasets
*3.4 Estimation of Unknown Concentrations
*3.4.1 Statistically Based Methods
*3.4.2. Surrogate Aided
*3.4.3 Physically Based Methods
*3.5 Measurement of Uncertainty
*4. Progress Report
*4.1 Visibility and PM10 as Surrogates for PM2.5 Concentrations
*4.2 Elevation Correction of PM10 Concentration Fields
*4.3 Declustering in Spatial Estimation
*4.4 Spatial Structure of Eastern U.S. Ozone
*4.5 Integration of work and future additions
*4.5.1 Potential Difficulties and Limitations
*4.5.2 Work Schedule
*5. References Cited
*APPENDIX A. - Maps of PM2.5 over the U.S. derived from regional PM2.5 and surrogate visibility and PM10 monitoring data ..a-I
APPENDIX B. - incorporating topography in the spatial interpolation of pollutant concentrations .. b-I
APPENDIX c. - declustering in the spatial interpolation of air quality data ..c-I
1. Specific Aims and Significance
Environmental researchers, policy makers, and epidemiologists have a need for spatially complete air pollution concentrations for activities such as air quality pattern and trend analysis, monitor network design, and exposure assessment. Monitoring networks have been established with the objective of providing knowledge of ambient air pollutant concentrations and their impact on human health and welfare. Monitors provide valuable information at their locations but monitoring networks leave large gaps in areas where it is desirable to have an understanding of the concentrations. There is a need for estimation methods that provide reliable air pollutant concentrations at non-monitored locations. Existing estimation methods are primarily statistically based and hampered by their exclusion of physical factors influencing ambient pollutant concentrations, such as topography and meteorology. For illustrative purposes, Figure 1 displays a contour plot of estimated ozone concentrations generated from distance weighted interpolation. The high concentrations in St. Louis are excessively dispersed over much of southcentral Missouri because no sites exist in the area and no additional information was incorporated to restrict the bias.

Figure 1.
Contour map of 90th percentile daily maximum ozone concentrations in Missouri with monitoring stations superimposed (squares). The contour was generated using inverse distance weighted. Rivers (blue lines) and interstates (brown lines) are included as visual guides.The general objective is to improve the quality of spatially estimated air pollutant concentration fields for the coterminous U.S. Specifically, new methods will be developed for estimating air pollution concentrations that will be
Adequate spatial resolution estimated concentration fields can be applied:
Monitoring networks are established to study the air quality in specified regions. The collected samples characterize the pollutant concentrations at the monitoring locations, but to gain insights into the concentrations at other locations, estimation methods and techniques to assess their uncertainties have been developed.
A review of the literature reveals numerous spatial air pollutant concentration estimation techniques. The approaches generally fall into three categories: statistical, surrogate aided, and physically based.
2.1.1.1 Inverse Distance Weighted Interpolation
The general theory underlying inverse distance weighted interpolation is that points closer to the estimation location are more influential than points farther away (Watson, 1992). The estimate is obtained from a weighted average of the relevant stations with stations closest to the estimation point receiving the largest weights.
Figure 2 displays an example configuration of monitoring stations and estimation location.
![]()
where ci is the estimated concentration at location i,
n is the number of monitoring sites,
cj is the concentration at monitoring site j,
wij is the weight assigned to monitoring site j.
The weights are determined from the distances between estimation point and monitoring sites, rij, so that

where n is the power-law of distance weighting.

Figure 2.
Configuration of monitoring sites and estimation location.Each of the sites has their concentration weighted by the inverse of their distance from the estimation point. The weighted concentrations are then summed and divided by the sum of the weights to ensure that the weights sum to one.
2.1.1.2 Geostatistical
Much of the current work aimed at improving the interpolation process for air pollutant concentrations involves geostatistical techniques. A frequently invoked technique, kriging, accounts for the spatial variability in the data as well as their spatial distribution (Isaaks and Srivastava, 1989). Kriging was originally developed as a statistical tool in the mining industry for estimating ore deposits. Like simple distance weighted interpolation, kriging is based on the separation distance between the monitoring sites and the estimation location with the estimate being a linear combination of weighted concentrations at neighboring monitoring stations. Kriging distinguishes itself in that the weights are determined by minimizing the estimation variance. The estimation variance is derived through covariances that are dependent on a random variable model called the variogram. The variogram model is developed by comparing the concentrations between all pairs of monitoring stations. The distance separating each monitoring station pair is used to place the pair in a distance bin and the covariance is calculated for all stations pairs within each bin. Plotting the covariance values against the bin distances results in the sample variogram. A function is fitted to the sample variogram and is applied to the error minimization. Subsequently, the kriging weights are derived as,
![]()
where wij is the weight assigned to monitoring site j,
Cjxj-1 is a matrix of that contains the covariance between all pairs of monitoring station,
Dij is a vector that contains the covariance between monitoring sites j and the estimation point i.
The covariance vector, Dij, can be interpreted as the weights obtained in inverse distance weighted interpolation except the distances are statistical in nature in that they account for the covariance between the monitoring sites and estimation points as well as their separation distance. The covariance matrix, Cjxj-1, accounts for the separation distances and covariances between all pairs of monitoring sites. This allows kriging to incorporate aspects of the monitoring network that simple interpolation schemes do not, namely the clustering and redundancy of sites.
Kriging has been applied to atmospheric variables such as wind speed and direction, acid precipitation, tropospheric ozone, and precipitation (Lefohn et al., 1987; Seilkop and Finkelstein, 1987; Eynon, 1987; Venkatram, 1988; Palomino and Martin, 1994; Liu and Rossini, 1996).
Secondary variables related to the primary variable being estimated can improve the estimation process, especially in areas of sparse primary variable monitoring. The secondary variables are usually highly correlated with the primary variable and can be thought of as surrogates for the primary variable.
Willmott, et al, (1995a) advanced their estimation of temperature fields by incorporating a second air temperature field that is sampled over a different time period but at a higher spatial resolution. The higher resolution surrogate data are related to the lower resolution temperatures and the relationship is applied to inverse distance weighted interpolation.
Cokriging is an extension to simple kriging that includes surrogate data. The cross correlation between the surrogates and the primary variable is used to reduce the estimation variance in the kriging system (Phillips, et al., 1997)
Physically based methods incorporate the laws of nature to which the pollutant is subject. The laws are incorporated into the estimation process as models that focus on the causal relationships between the primary and secondary variables.
Willmott, et al, (1995b) introduced a modified spatial interpolation scheme for air temperature data. They incorporate digital elevation model (DEM) data and the lapse rate relationship to adjust the temperature estimates according to elevation. High resolution measured temperature data from an earlier period are used to increase the spatial resolution of the measured temperature for interpolation. The measured temperature data is adjusted to sea level using the environmental lapse rate. The sea level temperatures are interpolated and then the interpolated temperatures are adjusted to actual elevation using the lapse rate again. The modified interpolation is 35% more accurate for estimating air temperature data than simple interpolation techniques.
Geographical Information Systems (GIS) are becoming accessible for environmental data analysis and facilitate the use of multiple data sets to exploit relations among the data. Ollinger et al. (1995) used the high resolution of an elevation data base with deposition data to conduct regression analysis and then used the derived relations to improve the resolution of sulfur and nitrogen deposition maps in the northeastern U.S.
Lee, et al, (1997) used an interpolation scheme driven by a GIS to generate ozone concentraiton fields. The method calculates a Potential Exposure Surface (PES) which is based on two physical assumptions. One, the areas downwind of locations with large ozone precursor emissions and experiencing higher temperatures and low cloud cover will have a greater potential for higher ozone concentrations and secondly, areas in close proximity to each other with similar PES values will have similar actual ozone exposure. They use annual emission inventories and assume that ozone exposure is a function of the amount of upwind NOx emissions, temperature, and cloud cover. NOx is spatially dispersed from the 1443 emission sources using wind direction and a decay function. Elevation is accounted for by imposing the restrictions that if the terrain height is between 500-1500 meters, 50% of the plume passes over and if its higher than 1500 meters of any cell within 20 km of it, none of the plume gets through. Temperature and cloud cover are incorporated so that at high temperatures and low cloud cover there is a high PES and low temperature and high cloud cover there is a low PES. They found the resulting maps more realistic than those produced using simple distance weighted interpolation but were unable to provide any quantitative metrics for comparison.
Loibl et al., (1994) extended the pure statistical approach of kriging estimation of tropospheric ozone by incorporating an elevation and diurnal cycle dependence model to improve the resolution of ozone concentrations in the complex terrain of Austria.
The jack knife method of cross validation has been extensively used to compare pollution estimation methods by providing a measure of interpolation performance at locations where data exist (Efron and Gong, 1983). The process consists of removing a monitoring site and determining an estimate for that location using the remaining sites. The difference between the actual observed concentration and the estimated concentration is a measure of error for the estimation scheme.
One of the advantages kriging possesses over other interpolation schemes is its ability to determine an uncertainty estimate along with its variable estimate (Myers, 1997). The kriging uncertainty is derived from the fact that kriged estimates are based on an error minimization process and is essentially the minimum error obtained. Factors influencing this error are the distance of sites from estimate, distance of sites from each other (clustering), and their covariance (redundancy). The kriging errors are generated in the form of complementary maps, known as "shadow maps of certainty" (Berry, 1995), that provide the user of the estimate maps with uncertainties associated with the estimates. Haas (1992) utilized this uncertainty in developing a method for designing acid deposition monitoring networks.
3.1 Review of Ozone and Particulate Matter Spatial Characteristics
The NAAQS for ozone and particulate matter were recently revised by the EPA. The review process generated in depth studies for some metropolitan areas resulting in vast amounts of data that extensively characterize the spatial pollutant emissions, chemistry and physics, and ambient concentrations (U.S. EPA, 1996a; U.S. EPA, 1996b). These studies will be summarized and analyzed to form a knowledge base of the spatial behavior of ozone, their precursors, and particulate matter (PM), both fine mass (PM2.5) and total (PM10). Sufficient surrogate data further clarify the concentration patterns and trends. Geographic areas likely investigated for particulate matter are Philadelphia (Tropp, et al., 1996) and parts of the San Joaquin Valley in California (Chow, et al., 1993). The Photochemical Assessment Monitoring Stations (PAMS) network will be reviewed because of its extensive sampling of ozone precursor concentrations (U.S. EPA, 1996c). The Ozone Transport Assessment Group generated new understanding of the spatial characteristics of ozone over the eastern U.S. and will be used as a key resource (Husar, 1998). The data and results obtained from the Southern Oxidants study will also be reviewed (Chameides and Cowling, 1995).
Knowledge of the spatial behavior and characteristics of ozone and particulate matter is a necessary prerequisite for applying source receptor relationship models. The results of this part of the work will also be utilized later in the study to check the estimation methods for adherence to reality.
3.2 Review of Ozone and PM Source Receptor Relationship
This section of the work will review the major factors that influence the ambient concentration patterns, i.e. the source-receptor relationship (SRR). The SRR, through the use of various techniques, qualitatively and quantitatively apportions the pollution at the receptor to contributing sources. The SRR can be approached from two directions, the source oriented view where source emissions are tracked to the receptor by applying fundamentals of atmospheric chemistry, physics, and meteorology, or the receptor view, which begins with concentrations at the receptor and proceeds backwards with either physical/chemical or statistical principles to determine the pollutants origin. The SRR provides information such as residence time and region of influence to characterize the behavior of the pollutant after it has been emitted.
The ambient concentrations measured by the monitoring networks in the previous section can be viewed as "sources" of pollution that need to be transported and chemically transformed over space in order to know the subsequent concentration at a distance from the monitored location. The observation at the monitoring station is a "snap shot" of the air mass that has formed from multiple emission sources. An advantage to using concentration data rather than just emission data is that inherent in the observed concentration are all emission sources influencing the monitoring site.
A transfer matrix is a commonly used form to relate emission sources to receptor locations. It can be thought of as consisting of two main components the transit probability and the kinetic probability (Schichtel, 1996). The transit probability provides information on the likelihood that a emissions species at the source location will end up at the receptor location and incorporates such factors as wind direction, wind speed to obtain dry deposition rates. The kinetic probability provides information on the likelihood that a species emitted from a source is either non-reacted or transformed to the species by the time the air mass reaches the receptor.
Source receptor relationships, particularly in the form of transfer matrices, will be developed to relate the observed concentrations and emissions to the estimation locations (receptors).
3.3 Collection and Interpretation of Ozone and PM Datasets
An integral part of this research is the accumulation, organization, and evaluation of particulate matter and ozone data sets. The principle data used originates from EPAs Aerometric Information Retrieval System (AIRS) (U.S. EPA, 1994) which contains over 1500 monitoring stations for ozone and PM10. Other, smaller networks are used to supplement AIRS such as the Interagency Monitoring of Protected Visual Environments (IMPROVE) (Sisler et al., 1993), Northeast States for Coordinated Air Use Management (NESCAUM) (Flocchini et al., 1989), California Air Resources Board (CARB) (State of California, 1995), and Clean Air Status and Trends Network (CASTNet) (U.S. EPA, 1995). The data sets supply valuable information of the overall pattern and trend of air pollutant concentrations.
The tasks for this part of the work include:
This analysis will illuminate areas where the largest gaps in knowledge exist and will prepare the data for use in developed estimation techniques.
3.4 Estimation of Unknown Concentrations
Estimating air pollution effects on human health and other effects requires higher resolution than what is provided by existing monitoring networks. As outlined in the "Background" section of this proposal, improving spatial pollutant concentration estimates can be categorized as: statistically based, surrogate aided, and physically based estimation methods.
3.4.1 Statistically Based Methods
Inverse distance weighting is a useful starting point for understanding the spatial behavior of the pollutant. Maps generated using 1/r2 distance weighted interpolation will be used as initial maps to highlight the faults of simple distance estimation. The maps will provide a "base" estimation against which improvements can be compared and illuminated.
The applicability of variogram analysis for ozone and PM concentration data will be investigated. It is unlikely that a variogram model will be applicable to either PM or ozone data because the data violate inherent assumptions in the variogram. A variogram only considers separation distance not relative location of the monitoring sites and therefore requires that that the mean and variance of the data are constant over space.
Kriging provides a statistical framework that can potentially be combined with physically based methods. Using a physical model rather than the variogram for minimizing the kriging error variance will be investigated.
Although kriging is equipped to account for spatial clustering in the configuration of monitoring networks, a separate, stand-alone "declustering" scheme will be pursued for use in distance weighted interpolation methods.
Often, data related to air pollutant concentrations is available at a higher spatial resolution than the air pollutant concentration observations themselves. If the relation between the higher resolution data and the pollutant is highly correlated and well understood, the higher resolution variable can be used as a surrogate for the pollutant. Data sets such as visibility and PM10 will be applied as surrogates for derivation of fine mass concentrations. Other surrogates for PM and ozone will be sought.
3.4.3 Physically Based Methods
Physical and chemical principles will be incorporated into the estimation methods. The understanding of the physical and chemical behavior gained from the source receptor relationship can aid in filling the concentration void between monitoring stations. The data used to describe this behavior can be used to generate a model that relates the characterizing data to the pollutant concentrations. For example, topographical data is available at a very high resolution and therefore, if the elevation dependence of the air pollution can be modeled with elevation as the only independent variable, pollutant concentrations can be derived at every location where terrain elevation is known.
Wind direction and wind speed data will be incorporated to account for transport. They can be used to define a monitoring stations realm of representativeness. Monitoring site concentrations are more representation of an estimate location if they are upwind of that estimate location than if they are downwind. Wind speed will be used to modify the radius of influence about an estimate location. If wind speeds are slow from a particular direction, then only sites close to the estimate location will be used in the interpolation.
Emission fields help represent sources that are not accounted for by observation networks. A simple source receptor relationship model will be developed for determining ozone concentrations at a receptor due to NOx and VOC emissions at multiple sources. The NOx emissions drive the level of ozone while the VOC emissions determine the rate at which the NOx is transformed to ozone. The concentrations resulting from the model are considered to be of local contribution. At the observed ozone locations, the local ozone is subtracted from the measured, or total, ozone concentration. The regional ozone is then spatially interpolated and the regional map is added to the local map to produce the final total ozone concentration map.
3.5 Measurement of Uncertainty
The U.S. air pollutant concentrations estimates will be more reliable in some areas than others. This is partly due to the existence of higher grade data or larger quantity of data at some locations or the physical nature of the pollutants spatial behavior. The relative differences in reliability need to be communicated along with the estimates for their effective application. The reliability measures also provide a means of comparing estimation techniques and assessing improvement caused by the application of surrogates or physical models.
Cross validation will be applied to assess the error reduction or possible increase when using statistical, surrogate aided and physical model aided techniques. Procedures of utilizing the kriging error with the surrogate and physical based methods will be examined.
Progress has been made in some areas outlined in the "Approach to the Thesis" section of this proposal. Data sets have been collected for both particulate matter and ozone data. The initial estimated concentration maps have been generated mainly as seasonal averages using simple inverse distance weighted interpolation. Fine particulate matter maps were augmented using visibility and PM10 data surrogates (Falke and Husar, 1998b) and by applying a elevation dependence model (Falke and Husar, 1996). A declustering scheme was developed to remove biases caused by clusters of urban sites in estimated concentrations (Falke and Husar, 1998a). Cross validation was used to quantify the improvements resulting from the new estimation methods.
This section contains summaries of the work accomplished to date. Detailed papers are placed in the appendices: Appendix A contains the extended methodology for visibility and PM10 surrogates in derived PM2.5 concentrations, Appendix B describes the elevation correction of PM10 concentrations, and Appendix C details the declustering scheme.
4.1 Visibility and PM10 as Surrogates for PM2.5 Concentrations
Fine mass is currently monitored at a limited number of locations mainly as part of the Interagency Monitoring of Protected Visual Environment (IMPROVE) and Northeast States for Coordinated Air Use Management (NESCAUM) networks that combined consist of only about 50 stations across the coterminous U.S. Incorporation of related data into the spatial interpolation of fine mass aids the interpolation. This section describes the methodology used in deriving fine particle maps for the U.S. by using visibility (280 sites) and PM10 data (~1000 sites) as fine mass surrogates. The approach utilizes existing fine mass concentration data as the measured "anchor points" for the derived maps. The higher resolution spatial coverage of the visibility and PM10 networks combined with their relationships to fine mass results in more detailed fine mass maps.
The key components of the surrogate aided interpolation of quarter 3 (July, August, September) averaged over 1988-1992 are shown in Figure 3a-d. Figure 3a shows the fine mass concentration sites superimposed on a contour map derived using inverse distance weighted interpolation. The pattern is very smooth because of the sparse data network. Figure 3b shows the high resolution extinction coefficient data derived from the visibility observation network. The high grade extinction coefficient (Bext) is used as a surrogate through the Koshmieder relationship (see Appendix A for details). The ratio of fine mass concentration to extinction is calculated at each fine mass monitoring site and is spatially interpolated to produce a ratio grid (Figure 3c). The fine mass to Bext ratio grid is multiplied by the observed Bext grid to produce higher resolution fine mass concentrations (Figure 3d).
The derived fine particle maps for the U.S. using visibility and PM10 data as fine mass surrogates were of higher resolution than maps generated with only fine mass data but did not alter the fine particle concentrations at the locations of PM2.5 monitoring stations. Areas with higher fine particle concentrations are more delineated in the surrogate aided maps, such as the San Joaquin Valley and the Industrial Midwest. Some regions experienced decreases in fine mass concentration such as parts of the West and the Appalachians. Correlations of observed fine particle concentrations with cross validation estimates were improved with the application of the surrogates. The visibility surrogate increased R2 values from0.86 to 0.89 in the quarter 3.




Figure 3 a)Quarter 3 fine mass concentrations from the IMPROVE and NESCAUM networks, 1988-1992. b)Extinction coefficient data c)Bext to fine mass ratio d)Visibility surrogate aided fine mass concentrations.
4.2 Elevation Correction of PM10 Concentration Fields
The basis for this interpolation method is that pollutant concentrations tend to be lower at elevated locations than in valleys. A relation for the decrease of pollutant concentrations with increasing elevation is applied to the general interpolation scheme. The elevation correction is performed using a sites elevation relative to its surrounding terrain. An elevation data set along with meteorological scale height data are used in conjunction with PM10 concentration data to conduct the elevation corrected interpolation scheme.
In general, locations at high altitudes have lower aerosol concentrations than in neighboring valleys (Dutkiewicz et al., 1987; Husar et al., 1980). This assumption, namely that at higher elevations pollutant concentrations are lower than at surrounding lower elevations, is the basis for the application of elevation in correcting PM10 concentrations. Applying the elevation relation to PM10 concentrations results in the map shown in Figure 4. The eastern half of the U.S. shows no elevation correction of the PM10 concentration except for a few areas in the Appalachians and in the mountains of northeastern New York when compared with the strictly distance dependent interpolation maps. In the West, the high elevations of the mountain ranges stand out with low concentrations while the valleys remained essentially uncorrected.
Figure 4a displays the PM10 concentrations for quarter 1 (January, February, and March) using simple distance weighted interpolation. The corrected PM10 concentrations are shown in Figure 4b. The corrected concentrations are based on the elevation (Figure 4c), scale height (Figure 4d) and the correction factor function in Figure 4e. The PM10 concentrations at the monitoring stations are first reduced to concentrations at an elevation at sea level. The sea level concentrations are then interpolated to a grid. This sea level concentration grid is than multiplied by a correction factor grid to return the concentrations to their actual elevations.





Figure 4. a)Initial quarter 1 PM10 concentrations from AIRS, b)Elevation corrected PM10 concentrations, c)Elevation data, d)Scale Height, e)Relationship of PM10 concentration with elevation and scale height
4.3 Declustering in Spatial Estimation
Air quality monitoring stations are generally located in or near urban areas while station coverage in rural regions is sparse causing pollutant concentration estimates using traditional interpolation to be biased. The proposed declustering methodology is illustrated in Figure 5 (refer to Appendix C for details). Two types of distance are considered in determining if a monitoring station resides in a cluster for purposes of spatial interpolation, 1) the distances between the monitoring site and its neighboring sites and 2) the distance from the monitoring station to the estimation point. If the distances between the monitoring site and its neighboring sites are small compared to the distance between the monitoring site and the estimation point, the site is clustered. The site will have a declustering weight assigned to it in order to reduce its influence on the interpolated estimate. In turn, a declustering weight is attached to each of the sites in the cluster and excessive influence from the cluster is avoided during interpolation.
In Figure 5a, the sites X1, X2, and X3 are equidistant from the estimation point i. Spatial interpolation will apply equal weight to all three so that each site has one-third influence on the estimate at i. A cluster of four sites exists in Figure 5b. Using inverse distance weighting, the cluster will account for 2/3 of estimated value at i while the two single sites only account for 1/6 of the total, leading to a biased estimate. Figure 5c shows the proper allocation of the weights with the cluster as a whole receiving a third of the weight as do the two single sites.
An illustrated comparison of declustered spatial interpolation with simple inverse distance weighted interpolation is shown in Figure 6. The stations (represented as circles) are equal distance from the center of the grid. The circle on the right represents a cluster of ten co-located stations with an average value of 9. The other stations are single and each has a value of 1. Figure 6a is an estimated grid using simple inverse distance weighted interpolation. The high values in the cluster influence the estimates in all areas of the grid except in the immediate proximity of the single stations. The estimate at the center of the grid is an average of all 13 sites which gives the cluster about 77% of the weight and each of the single sites about 7.7%.
Figure 6b contains the estimated grid using inverse distance weighted interpolation with declustering weights. The cluster of stations has its large influence restricted and the center of the grid is an average of the four locations where stations exist (rather than an average of 13 individual stations). The cluster, as well as the single sites, each attain about 25% of the total weight.



Figure 5. Examples of clustered and declustered monitor configurations, a) three single, non-clustered stations, b) a cluster of four sites that gets 4 times the weight of each of the two single sites, c) a cluster of four sites that is declustered so that it receives the same weight as the two single sites.

Figure 6. Estimate grids from 3 single, low value sites and a group of 10 co-located, high value sites. a) 1/r2 interpolation causes the group of ten sites to "spread" over most of the grid. b) Incorporating declustered weights contains the influence of the cluster.
4.4 Spatial Structure of Eastern U.S. Ozone
Variogram analysis was conducted for daily maximum ozone concentrations during an episode of high ozone concentrations in the eastern U.S. (6/25/95-7/1/95). The ozone concentrations in three circular sub regions, each with a radius of about 500km, were evaluated: the Northeast centered near Philadelphia, the Midwest centered near Chicago, and the Southeast centered neat Atlanta. Each of the monitoring sites within the region was paired with every other monitoring site within the region. The pairs were grouped into bins according to their separation distance. Ten bins were selected with an interval of about 85km so that the first bin contained all pairs separated by less than 85km, the second bin contained the pairs separated by a distance between 85 and 170 km, and so on. The correlation and covariance of the pollutant concentrations for pairs in a bin calculated. All correlations and variances were plotted against distance to create the correlograms and variograms shown in Figure 7. As expected, each region shows decreases in correlation and increases in variance as the separation distance increases. However, each regions spatial structure declines or inclines differently. The correlograms show that the Northeast and Midwest are similar in their spatial correlation structure with ozone concentrations still somewhat correlated at distances of 500km. The Southeast correlogram declines relatively sharply and site concentrations are no longer correlated at distances greater than 250 km. The variograms for the Midwest and the Northeast are dissimilar with the Midwest exhibiting low variance out to about 600km. The Northeast and Southeast variances rise sharply. The daily spatial structure for the three regions was also evaluated and the resulting variograms and correlograms showed substantial differences. The variogram analysis indicates that a single variogram model cannot be applied to spatial ozone analysis.









a) b) c)
Figure 7. Spatial structure of daily maximum ozone. a) Map view showing the radius used in the plots, b) the correlograms indicating correlation with respect to distance, c) variograms indicating the spatial variance with respect to distance.
4.5 Integration of work and future additions
One immediate challenge will be to fuse the existing estimation methodologies. For instance, the visibility data surrogate and PM10 surrogate methods were conducted separately and need to be merged into a single method.
A physically based model will be developed for integration into the estimation methodology. The elevation dependence model for PM10 can be used as a starting point for developing a transfer matrix that relates the observed concentrations and source emissions to the receptor sites. Separate models will be developed for ozone and particulate matter based on existing knowledge of their transmission and kinetic properties.
A method for obtaining uncertainties for all estimated locations will be developed to provide a more comprehensive reliability measure than the already implemented cross validation method.
4.5.1 Potential Difficulties and Limitations
Developing physically based models in the estimation process will rely on assumptions to make the models a simple as possible. Ozone is a secondary pollutant with complex chemistry and transport and the simplifications imposed on the models will have associated errors that are difficult to quantify.
One of the benefits gained from this research also exposes one of its limitations. The proposed methods rely heavily on data and as a result, difficulties will arise in areas of very sparse data coverage. If little is know about a region with no pollutant concentration data at nearby locations, no meteorology or emissions for the area, and no surrogate data, then any estimate of pollutant concentration will be unreliable. It is hoped that the addition of uncertainty values in these areas will provide meaning and usefulness to these estimates.
Berry J.K. (1995) Spatial Reasoning for Effective GIS, GIS World, Fort Collins, CO.
Chameides, W.L. and Cowling, E.B. (1995) The State of the Southern Oxidants Study (SOS): Policy-Relevant Findings in ozone Pollution research 1988-1994. North Carolina State University, Raleigh, N.C.
Chow, J.C., Watson; J.C., Lowenthal, D.H., Solomon, P.A., Magliano, K., Ziman, S., and Richards, L.A (1993) PM10 and PM2.5 Compositions in Californias San Joaquin Valley. Aerosol Science and Technology, 18, 105-128.
Dutkiewicz V.A., Parekh, P.P., and Husain, L. (1987) An Evaluation of Regional Elemental Signatures Relevant to the Northeastern United States, Atmospheric Environment, 21, 1033-1044.
Efron B. and Gong G. (1983) A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation. Journal of the American Statistical Society, 37, 36-48.
Eynon B.P. (1987) Statistical Analysis of Precipitation Chemistry Measurements over the Eastern United States. Part II: Kriging Analysis of Regional Patterns and Trends, Journal of Applied Meteorology, 27, 1334-1343.
Falke S.R. and Husar R.B. (1996) Elevation Correction of PM10 Concentration Fields, Poster presentation at the 89th Air & Waste Management Association Annual Meeting and Exhibition, Nashville, TN, June 1996.
Falke S.R. and Husar R.B. (1998a) Declustering in the Spatial Interpolation of Air Quality Data, Paper No. 98-WPC.13P. To be presented at the 91st Air & Waste Management Association Annual Meeting and Exhibition, San Diego, CA, June 1998.
Falke S.R. and Husar R.B. (1998b) Maps of PM2.5 over the U.S. Derived from Regional PM2.5 and Surrogate Visibility and PM10 Monitoring Data, Paper No. 98-MPA.04P. To be presented at Air & Waste Management Association Annual Meeting and Exhibition, San Diego, CA, June 1998.
Falke S.R. and Husar R.B. (1998c) Correction of Particulate Matter Concentrations to Reference Temperature and Pressure Conditions, Paper No. 98-TP31B.01. To be presented at the 91st Air & Waste Management Association Annual Meeting and Exhibition, San Diego, CA, June 1998.
Flocchini R.G., Cahill T.A., Eldred R.A., and Feeney P.J. (1989) Particulate sampling in the Northeast, a description of the Northeast States for Coordinated Air Use Management (NESCAUM) network. In: Visibility and Fine Particles, C.V. Mathai, Ed. 197-206.
Haas, T.C. (1990) Kriging and Automated Variogram Modeling within a Moving Window, Atmospheric Environment, 24A, 1759-1769.
Haas, T.C. (1992) Redesigning Continental-Scale Monitoring Networks, Atmospheric Environment, 26A, 3323-3333.
Holland, D.M.; Baumgardner, R.; Haas, T.; and Oehlert, G. (1994) Design of the Clean Air Act Deposition Monitoring Network In: Environmental Statistics, Assessment, and Forecasting. Eds: Cothern, C.R. and Ross, N.P., Lewis Publishers, Boca Raton.
Hogsett, W.E.; Weber J.E.; Tingey D.; Herstrom A.; Lee E.H.; and Laurence J.A. (1997). Environmental Auditing: An Approach for Characterizing Tropospheric Ozone Risk to Forests. Environmental Management, 21, 105-120.
Husar R.B., D.E. Patterson, D.L. Blumenthal, W.H. White, and T.B. Smith (1980) Three-Dimensional Distribution of Air Pollutants in the Los Angeles Basin in The Character and Origins of Smog Aerosols, John Wiley & Sons, New York.
Husar, R.B. (1998) Spatial Pattern of Ozone over the OTAG Region, Paper No. 98-MA2A.01. To be presented at the 91st Air & Waste Management Association Annual Meeting and Exhibition, San Diego, CA, June 1998.
Isaaks, E.H.; Srivastava. R.H. (1989) An Introduction to Applied Geostatistics. Oxford University Press, New York.
Lefohn A.S., Knudsen, H.P., Logan, J.A., Simpson, J., and Bhumralkar, C. (1987) An Evaluation of the Kriging Method to Predict 7-h Seasonal Mean Ozone Concentrations for Estimating Crop Losses, Journal of the Air Pollution Control Association, 37, 595-602.
Liu, L.-J.S. and Rossini, A.J. (1996) Use of Kriging Models to Predict 12-Hour Mean Ozone concentrations in Metropolitan Toronto A Pilot Study, Environment International, 22, 677-692.
Loibl W., Winiwarter, W., Kopsca, A., Zueger, J., and Baumann R. (1994) Estimating the Spatial Distribution of Ozone Concentrations in Complex Terrain, Atmospheric Environment, 28, 2557-2566.
Meiring, W., Guttorp P., and Sampson, P. D. (1997) Space-time estimation of grid-cell hourly ozone levels for assessment of a deterministic model. Submitted to Environmental and Ecological Statistics.
Myers, J.C. (1997) Geostatistical Error Management. Van Nostrand Reinhold, New York.
Ollinger, S.V., Aber, J.D. , Federer, C.A., Lovett, G.M., and Ellis, J.M. (1995) Modeling Physical and Chemical Climate of the Northeastern United States for a Geographic Information System, Gen. Tech. Rep. NE-191, Radnor, PA: U.S. Department of Agriculture, Forest Service, Northeastern Forest Experimental Station.
Palomino, I., and Martin, F. (1994) A Simple Method for Spatial Interpolation of the Wind in Complex Terrain, Journal of Applied Meteorology, 34, 1678-1693.
Phillips, D.L.; Lee, E.H.; Herstrom, A.A.; Hogsett, W.E., and Tingey, D.T. (1997) Use of Auxiliary Data for Spatial Interpolation of Ozone Exposure in Southeastern Forests, Environmetrics, 8, 43-61.
Schichtel, B.A. (1996) The Retrieval of Pollutant Emission Fields from Ambient Concentration and Precipitation Chemistry Data, D.Sc. Thesis, Washington University, St. Louis, MO.
Seilkop S.K, and Finkelstein, P.L. (1987) Acid Precipitation Patterns and Trends in Eastern North America, 1980-84, Journal of Climate and Applied Meteorology, 26, 980-994.
Sisler J.F., Huffman, D., Latimer, D.A., Malm, W.C., and Pitchford, M. (1993) Spatial and temporal patterns and the chemical composition of the haze in the United States: An analysis of data from the IMPROVE network, 1988-1991. Report #ISSN No. 0737-5352-26 CIRA, CSU, Fort Collins, CO.
State of California Air Resources Board (1995) California State and Local Air Monitoring Network Plan, Report TSD 95-001.
Tropp, R.J.; Sleva, S.F.; Ramadan, W.; and Harris, C.J (1996) Results of the 1994 Philadelphia PM2.5 and PM10 Saturation Study. Presented at the 89th Air & Waste Management Association Annual Meeting and Exhibition, Nashville, TN, June 1996.
U.S. National Oceanic and Atmospheric Administration, National Geophysical Data Center (1995) TerrainBase Global Digital Terrain Model, Version 1.0.
U.S. Environmental Protection Agency (1994) AIRS Users Guide, Volume AQ1: AQS Data Dictionary. EPA-454/B-94-005.
U.S. Environmental Protection Agency (1995) CASTNet National Dry Deposition Network, 1990-1992 Status Report. EPA-600/R-95-086.
U.S. Environmental Protection Agency (1996a), Air Quality Criteria for Particulate Matter EPA/600/P-95/001aF April.
U.S. Environmental Protection Agency (1996b) Air Quality Criteria for Ozone and related Photochemical Oxidants, Chapter 4. EPA/600/P-93/004aF July.
U.S. Environmental Protection Agency (1996c) PAMS Data Analysis Results Report, EPA-454/R-96-006.
Venkatram, A (1988) On the use of Kriging in the Spatial Analysis of Acid Precipitation Data. Atmospheric Environment, 22, 1963-1975.
Watson, D.F. (1992) Contouring: A Guide to the Analysis and Display of Spatial Data. Peragmon Press, N.Y.
Willmott C.J. and Robeson S. M. (1995a) Climatologically Aided Interpolation (CAI) of Terrestrial Air Temperature. International Journal of Climatology, 15, 221 - 229.
Willmott C. J., and Matsuura K. (1995b) Smart Interpolation of Annually Averaged Air Temperature in the United States. Journal of Applied Meteorology, 34, 2577 - 2586.