Uncertainty in the Spatial Interpolation of Ozone Monitoring Data
by Stefan R. Falke,stefan@mecf.wustl.edu and Rudolf B. Husar ,
rhusar@mecf.wustl.edu
Center for Air Pollution Impact and Trend Analysis (CAPITA)
May, 1996
DRAFT
Spatial interpolation is frequently applied in estimating air pollutant concentrations. A common interpolation technique is the inverse distance weighted interpolation (1/r^n). Two general factors that influence the interpolation procedure are the type of data used and the "setting" of the interpolation, i.e. the number of stations used and the weighing factor. The data used in this work were hourly ozone concentrations from the EPA's Aerometric Information Retrieval System (AIRS) and Clean Air Status and Trends Network (CASTNet) networks for the eastern U.S. The uncertainty of the interpolation was tested from three perspectives, 1)comparison of the location setting (urban, suburban, rural) influence on the interpolation , 2)analysis of the number of stations used in the interpolation, and 3)evaluation of the effect of adding sites to the data set.
The spatial interpolation procedure used performed adequately in estimating ozone concentrations in the eastern half of U.S. with R2 values ranging from 0.6 to 0.85.
Using a maximum number of three or four ozone monitoring sites generally produced the best interpolation performance. The addition of CASTNet sites to the AIRS network did not show any substantial improvement in the interpolation results.
Introduction
Spatial interpolation is commonly used for
estimating air pollutant concentrations at locations between monitoring
stations. A common interpolation scheme is the inverse distance (r) weighted
interpolation (1/rn). This work tests the performance of distance
weighted interpolation in estimating ozone concentrations in the
eastern half of the United States. Interpolation performance testing was done by removing ozone monitoring sites from the ozone data set, interpolating the remaining sites, and comparing the measured concentration at the removed sites with their respective interpolated values. This testing method was used to analyze the ability to estimate urban, suburban and rural ozone concentrations. It was also used to determine the ideal number of sites to use in the interpolation and to evaluate the influence of adding monitoring sites to an existing network.
Back to Contents
Methodology
Spatial Interpolation
Spatial interpolation was conducted using a program called the
Gridder. It uses a table of latitude, longitude,
and concentration values to create a concentration grid for a
specified region. The Gridder allows the user to set the radius
of influence, the minimum number of stations to be used in the
interpolation, the maximum number of sites for use in the interpolation,
and the type of distance weighing. The Gridder allows the user to select whether the output be
a full X by Y grid of interpolated values or a table of interpolated
values for specific locations. In this analysis the output was
in the table format. The pseudocode in the appendix describes the details
of the gridder algorithm.
Interpolation Testing
The AIRS network provided an adequate station density for most
parts of the United States for an interpolation process performance
testing based on the monitored data. The interpolation performance test was conducted by removing approximately
ten percent of the sites from the data set. The interpolation
was conducted using the remaining 90 percent. The concentrations interpolated at the removed locations was compared with the actual measured concentrations. The figure below illustrates the spatial distribution of the removed (10%) sites with respect to the remaining (90%) sites.
The removed sites are displayed as red squares.

Map of randomly removed (red squares) and remaining (yellow squares) station locations.
Figure 3 illustrates the entire interpolation scheme in a flow diagram.
Ozone data
for the region in question is submitted to the 10/90 operator
which randomly selects 10% of the data and places it in a separate
table from the remaining 90%. These two tables are then passed
to the Gridder where contouring is conducted
at the 10% table locations based on the 90% data. The resulting
table containing the contoured concentrations was then placed in
excel where a scattergram is created along with the respective
correlation statistics.

Figure 3. Data flow diagram for the spatial interpolation
testing.
Back to Contents
Location Setting Analysis
The objective of this analysis is to determine how well the ozone concentrations estimated through spatial interpolation match those of actual observation concentrations. A second objective was to analyze the role location setting (urban, suburban, rural) played in the accuracy of interpolated ozone concentrations.
Data Used
The location setting analysis covered the spatial domain of the eastern U.S. Ozone concentration data were obtained from AIRS. Two distinct ozone episodes were chosen to test the interpolation based on urban, suburban, and nonrural site classification. The data used in this analysis was the 14:00 hour ozone concentration from a single day in each of the two episodes. Determination of the day used was made by analyzing "movies"
of hourly ozone concentrations of each episode, namely July
1 to July 23, 1988 and July 20 to July 31, 1993. July 7, 1988
exhibits regionally high ozone concentrations extending from South
Carolina to Wisconsin in the west and up to Vermont in the east.
Both urban and rural areas show concentrations in exceedance
of the ozone standard (120 ppb). July 23, 1993 is part of a more
localized episode in the Southeast. The region from Georgia in the
South to Indiana in the North and from Missouri in the west to
North Carolina in the east have high concentrations but only local
urban areas, such as Atlanta and Memphis show levels
near or above the standard.
            
Figure 1. Contoured Maps of ozone episodes on July 7 ,
1988 and July 23, 1993.
These two distinct concentration patterns provide test cases to statisitically measure
the performance of spatial interpolation in estimating
regional ozone concentrations as well as localized hot spots.
AIRS sites contain a Location Setting Code in the sites' location names. Three code types were used: (1) Urban and center city, (2) suburban and (3) rural. AIRS also contains some sites that were not classified. The locations of the classified sites for the 1993 episode are
shown in Figure 2. The 1993 episode was defined by 102 urban,
259 suburban, 238 rural, and 13 nonclassified sites totalling 612 total sites. The 1988
episode had 73 urban, 218 suburban, 186 rural, and 13 nonclassified
sites, 490 total.

Figure 2. Ozone monitoring sites in the eastern half of the AIRS
network for July 1993.
Interpolation Settings
The settings of the Gridder in the location setting analysis
were an inverse square distance weighing (1/r2), a maximum number
of sites of 4, a minimum of 1, and a radius of influence of 0.1
radians (~500km). The radius of influence was not a factor in the interpolation
since the maximum number of sites was always reached within a
radius of less than 0.1 radians. The dependence on the maximum
number of sites meant that the actual radius of influence varied;
locations with a high station density had a small effective radius while
areas with a low station density had a large effective radius.
Four interpolation runs were conducted for each data set (1988 and 1993). The first consisted of a random removal of 10% of the sites, independent of location setting. The other three runs consisted of an interpolation with the removal of only urban, only suburban, and only rural sites.
Results
The July 7, 1988 episode showed that when a random set of stations
was removed from the initial data set, the interpolation process
produced an R2 of about 0.85 (Figure Xxx). The interpolation performed least
favorable when only urban sites were removed, having a R2 of 0.70. Removing the suburban sites had the least effect on the fit with an R2 of 0.86 while the rural site run
produced an R2 of 0.81. The slope remained failry constant for
all four runs while the offset (y-intercept) was lowest for the
random and suburban sets and highest for the urban and rural sets.

Figure 4. Correlation plots and statistics for random, suburban,
urban, and rural sites on 7/7/88.
The testing for July 23, 1993 data produced a similar correlation coefficient
to the 1988 episode for the random removal (R2=0.83) but the classified location setting
sets gave different results (Figure Xxx). The best interpolation performace was found in the rural
classification with an R2 of 0.70, while the urban and suburban resulted in correlation coefficients of 0.66 and 0.60 respectively.
The urban set produced the lowest slope of 0.76 and highest offset at 14.5

Figure 5. Correlation plots and statistics for random, suburban,
urban, and rural sites on 7/23/93.
The randomly selected site test (independent of location setting) for July 23, 1993 was conducted three additional times to verify the high correlation it produced in the first run compared to the classified runs. It was thought that the outlying point in the
random run (~150ppb, see Figure a) may have biased the correlation statistics.
The three tests produced R2 between 0.78 and 0.86 and supported the R2 of 0.83 from the first run. The slopes showed a wide range, from 0.77 to 0.94.

Figure 6. Correlation plots for randomly selected
sites on July 23, 1993.
The above correlation plots indicate that the spatial interpolation of ozone conentrations in the eastern U.S. results in concentrations that match observed concentrations fairly well. The interpolation testing method produced R2 values between 0.6 and 0.85. Due to the limited number of testing runs, it was difficult to determine whether the location setting of a site had any effect on reducing or increasing the incertainty in the interpolation.
Back to Contents
An important factor in the spatial interpolation of pollutant concentrations is the number of sites used. In areas of high spatial density, it may be adequate to use the nearest site to the estimated location. Using a higher number of sites may be desirable in regions with sparce monitor site coverage. The objective of this analysis is to determine the most adequate number of stations to include in spatially interpolating ozone concentrations.
Data Used
The daily maximum ozone for the period 1991 to 1995 was calculated for the eastern U.S. using hourly AIRS ozone conentration data. The daily maximum values for the season June, July, and August were averaged over 1991 to 1995. The resulting five year seasonal average daily maximum ozone concentrations were used in the following analysis.
The settings used in the evaluation of the number of sites were
inverse square distance weighing (1/r2), a radius of influence of 0.1
radians, and a minimum number of sites set to 1. The maximum number of sites were varied with values of 1,2,3,4,5, and 10.
Results
Two separate randomly removed sets of 10% of the AIRS data were interpolated. Each set was interpolated six times using a maximum number of sites of 1,2,3,4,5, and 10, see Figures x and x.

Correlation plots from 10% removed AIRS set 1.

Correlation plots from 10% removed AIRS set 2.
The above sets indicate that the most accurate interpolation is achieved using a maximum number of sites of three. In both of the test sets, increasing the number of sites from one to two increased the correlation coefficient by the largest margin, from an average R2=0.297 to and average R2=0.349. The increase from two to three sites boosted the average R2 to 0.366. Using four sites slightly reduced the average R2 to 0.364. Including five sites produced an R2 only slightly lower at 0.360. Finally, increasing the sites to 10 resulted in a decrease of the average R2 (0.354) but in the first set using ten sites actually produced a higher R2 (0.415) than using five sites (R2=0.412).
The low R2 values were caused by large ozone concentration variation over a small spatial scale. The main outlyer in Figure x has an observed O3 concentration of about 70ppb while the interpolated value is about 35ppb. The 70ppb site which was removed from the intital AIRS data set was in an area of high station density. The reamining sites (those used in the interpolation calculation) in the immediate area surround the 70ppb site had concentrations considerably lower than 70ppb and as a result the interpolated concentration at that location was about half of the observed concentration.
It would be beneficial to do further interpolation test runs to solidify the jsut described results. The next section contains additional interpolation runs using a combination AIRS/CASTNet data set and will be compared to the above results.
Back to Contents
The objective of this section was to evaluate the influence on interpolation performance by supplementing a data set with additional ozone monitoring stations. The Clean Air Status and Trends Network (CASTNet) was used to supplement the AIRS network. Spatial interpolation was conducted with the AIRS and CASTNet data set and compared to interpolation performance using only AIRS data.
Data Used
Seasonally (June,July, and August) averaged daily max ozone concentration of the CASTNet data were combined with the AIRS data described in the previous section. CASTNet added 75 regional monitoring stations to the 857 AIRS stations. The figure below shows the distribution of the AIRS (yellow boxes) and CASTNet (red boxes) ozone monitoring stations.

Ozone monitoring sites for AIRS (yellow boxes) and CASTNet (red boxes) network.
The settings used in the evaluation of additional sites were
inverse square distance weighing (1/r2), a radius of influence of 0.1
radians, and a minimum number of sites set to 1. The maximum number of sites were varied with values of 1,2,3,4,5, and 10 for some of the runs for purposes of verifying the results in the previous section. Subsequent runs used only three as the maximum number of sites.
A single 10% removed set was obtained from the AIRS/CASTNet combined data. The remaining data was used in estimating concentrations at the removed locations (Figure ).

Correlation plots from 10% removed AIRS/CASTNET set.
The interpolation runs on the AIRS/CASTNet data followed the same trend regarding the number of sites used. The R2 peaked (0.724) when using three stations.
Ozone concentrations at the removed AIRS locations from the two sets used in the previous number of station analysis were estimated using the AIRS/CASTNet data set. The resulting correlation plots are shown below.

Correlation plots from 10% removed AIRS set 1.

Correlation plots from 10% removed AIRS set 2.
Figure x shows that using three sites from the AIRS/CASNet data set improves the R2 to 0.422, slightly above R2=0.419. Unlike the interpolation results with AIRS data only, the combined AIRS/CASTNet data displayed an R2 increase (R2=0.441) when four sites were used in the interpolation. In fact, after an R2 decrease when using five sites, including ten sites gave the highest R2 of 0.446.
The second removed AIRS set used with the AIRS/CASTNet data in Figure x also exhibits slight improvement over just the AIRS data, from R2=0.314 to R2=0.324. Again, using four sites increased the R2, to 0.335.
The above correlation plots along with those in the number of sites analysis indicate that using either three or four sites during interpolation produces the "best" ozone concentration estimates. Since there was very little overall difference between three or four sites, the following interpolation evaluation used only three sites.
One outstanding point in the above figures is the high R2 (0.7) produced in Figure x compared to the lower R2 values (~0.4) in figure x and x. The removed set in figure x contained eight CASTNet stations and 85 AIRS stations. The removed set in Figure x and x contained 85 AIRS stations. The removed AIRS/CASTNet set did not exhibit the effects of large ozone concentration variation over small spatial scales. The 85 AIRS sites in the AIRS/CASTNet removed station set were removed from both the initial AIRS ozone concentration data set and the initial AIRS/CASTNet data set and were estimated using the same interpolation mehtod as previously. The resulting scatterplots are shown in Figure xx).

Correlation plots from 10% removed AIRS/CASTNET location set. Interpolated with AIRS (a) and AIRS/CASTNet (b) data.
An additional 10% removed set was created from the AIRS data. Interpolation was conducted for both AIRS and AIRS/CASTNet ozone concentration data sets. The R2 using the AIRS data interpolation was about 0.6 while for the combined AIRS/CASTNet data it dropped to 0.51. The cause of this drop was the fact that a CASTNet site was located at nearly the same location as one of the removed sites. The CASTNet site had a concentration about 20 ppb higher than the removed AIRS station and caused the estimated concentration to be substantially higher than the observed concentration. The R2 for the AIRS/CASTNet run without that point improved to 0.572 while for the AIRS run it remained essentially unchanged (R2=0.594). It is interesting to note that the addition of the CASTNet data in this case showed a decrease in the accuracy of the ozone estimation, unlike the previous runs.

Correlation plots from 10% removed AIRS set 3. Interpolated with AIRS (a)and AIRS/CASTNet (b) data.
It was thought that possibly the 10% was too small of a subset. A larger set may not be influenced as heavily in its R2 by the few outlyers. A removed set was generated from 20%, rather than 10% of the AIRS. The results in Figure x show virtually identical R2 values for the AIRS and AIRS/CASTNet runs of 0.549 which are not an improvement over the R2 produced by the 10% removed sets.

Correlation plots from 20% removed AIRS set. Interpolated with AIRS and AIRS/CASTNet data.
Back to Contents
Conclusions
Three approaches were used in evaluating the spatial interpolation of ozone concentrations in the eastern U.S.
- The results from the location setting analysis indicate that the spatial interpolation procedure used performs adequately in estimating ozone concentrations in the eastern half of U.S. The regional ozone of 1988 was better estimated overall (R2 between 0.71 and 0.86) than the more textured pattern of 1993 (R2 between 0.60 and 0.83) where the interpolation underestimated the concentrations more often. It was difficult to determine whether the location setting of a site had any effect on reducing or increasing the uncertainty in the interpolation due to the limited number of test runs.
- The number of sites producing the "best" interpolation performance in estimating ozone conentrations occured when using three or four sites.
- The addition of the CASTNet ozone monitoring network to the AIRS network did not substantially improve the estimation of ozone concentrations. Depending on the interpooation run, it either slightly improved the R2, slightly lower it, or left it unchanged.
While this work provides a foundation for quantifying the uncertainty in distance weighted spatial interpolation of ozone concentrations, additional test runs (tens or hundreds of runs, rather than the 5-10 used in this work) are required to obtain solid statistical measures. The AIRS ozone monitoring network has high spatial station density in the eastern U.S. which hampered some of the interpolation testing. For instance, highly varying ozone concentrations in an urban area (a single site measuring a concentration of 70 ppb while other stations in its immediate area measured concentrations of approximately 20 ppb) biased the results. The spatially dense network also caused the addition of sites to have no distinguishable effect on the ozone interpolation performance.
Back to Contents
Appendix
Contour
Functionality: transform data from table format into grid format
Input: Table - a set of data points
WeightFunc - distance weight function
Radius - distance constraint
MinPoints - minimum required number of data points within Radius
MaxPoints - maximum number of data points used in calculation of a grid cell
Output: Grid - uniformly distributed set of data points
function Contour(out Grid, in Table, in WeightFunc, in Radius, in MinPoints, in MaxPoints)
{
for each Cell in Grid
CalcCell
}
CalcCell
Functionality: calculate value of a grid cell given a table of data points around it
function CalcCell(in out Cell, in Table, in WeightFunc, in Radius, in MinPoints, in MaxPoints)
{
PointList = an empty list of table points
for each point in Table that is not null
if the distance between the point and the cell is less than Radius then
add point to the PointList, sorted by distance
Cell value = Interpolate PointList
}
Interpolate
Functionality: interpolate over a sorted list of data points
function Interpolate(in PointList, in WeightFunc, in MinPoints, in MaxPoints)
{
if number of points in PointList < MinPoints return null value
TotalWeight = 0
Sum = 0
WeightExp = 0 if WeightFunc = 1
1 if WeightFunc = 1/r
2 if WeightFunc = 1/r2, etc
for first MaxPoints in PointList
Dist = point distance from cell
Weight = 1 / (Dist^WeightExp)
TotalWeight = TotalWeight + Weight
Sum = Sum + Weight * point value
if TotalWeight = 0 return null value
Sum = Sum / TotalWeight
return Sum
}
Back to Contents
|
|
Submit your comments,
feedback, questions, and ideas pertaining this page.
Your input will be automatically added to the existing annotations. In order to add a new comment, you must be registered with the CAPITA People's Page.
|

support@capita.wustl.edu