Uncertainty in the Spatial Interpolation of Ozone Monitoring Data

by Stefan R. Falke,stefan@mecf.wustl.edu and Rudolf B. Husar , rhusar@mecf.wustl.edu

Center for Air Pollution Impact and Trend Analysis (CAPITA)

May, 1996

DRAFT


Spatial interpolation is frequently applied in estimating air pollutant concentrations. A common interpolation technique is the inverse distance weighted interpolation (1/r^n). Two general factors that influence the interpolation procedure are the type of data used and the "setting" of the interpolation, i.e. the number of stations used and the weighing factor. The data used in this work were hourly ozone concentrations from the EPA's Aerometric Information Retrieval System (AIRS) and Clean Air Status and Trends Network (CASTNet) networks for the eastern U.S. The uncertainty of the interpolation was tested from three perspectives, 1)comparison of the location setting (urban, suburban, rural) influence on the interpolation , 2)analysis of the number of stations used in the interpolation, and 3)evaluation of the effect of adding sites to the data set. The spatial interpolation procedure used performed adequately in estimating ozone concentrations in the eastern half of U.S. with R2 values ranging from 0.6 to 0.85. Using a maximum number of three or four ozone monitoring sites generally produced the best interpolation performance. The addition of CASTNet sites to the AIRS network did not show any substantial improvement in the interpolation results.

Contents:



Introduction

Spatial interpolation is commonly used for estimating air pollutant concentrations at locations between monitoring stations. A common interpolation scheme is the inverse distance (r) weighted interpolation (1/rn). This work tests the performance of distance weighted interpolation in estimating ozone concentrations in the eastern half of the United States. Interpolation performance testing was done by removing ozone monitoring sites from the ozone data set, interpolating the remaining sites, and comparing the measured concentration at the removed sites with their respective interpolated values. This testing method was used to analyze the ability to estimate urban, suburban and rural ozone concentrations. It was also used to determine the ideal number of sites to use in the interpolation and to evaluate the influence of adding monitoring sites to an existing network.

Back to Contents


Methodology

Spatial Interpolation

Spatial interpolation was conducted using a program called the Gridder. It uses a table of latitude, longitude, and concentration values to create a concentration grid for a specified region. The Gridder allows the user to set the radius of influence, the minimum number of stations to be used in the interpolation, the maximum number of sites for use in the interpolation, and the type of distance weighing. The Gridder allows the user to select whether the output be a full X by Y grid of interpolated values or a table of interpolated values for specific locations. In this analysis the output was in the table format. The pseudocode in the
appendix describes the details of the gridder algorithm.

Interpolation Testing

The AIRS network provided an adequate station density for most parts of the United States for an interpolation process performance testing based on the monitored data. The interpolation performance test was conducted by removing approximately ten percent of the sites from the data set. The interpolation was conducted using the remaining 90 percent. The concentrations interpolated at the removed locations was compared with the actual measured concentrations. The figure below illustrates the spatial distribution of the removed (10%) sites with respect to the remaining (90%) sites. The removed sites are displayed as red squares.


Map of randomly removed (red squares) and remaining (yellow squares) station locations.

Figure 3 illustrates the entire interpolation scheme in a flow diagram. Ozone data for the region in question is submitted to the 10/90 operator which randomly selects 10% of the data and places it in a separate table from the remaining 90%. These two tables are then passed to the Gridder where contouring is conducted at the 10% table locations based on the 90% data. The resulting table containing the contoured concentrations was then placed in excel where a scattergram is created along with the respective correlation statistics.


Figure 3. Data flow diagram for the spatial interpolation testing.

Back to Contents

Location Setting Analysis

The objective of this analysis is to determine how well the ozone concentrations estimated through spatial interpolation match those of actual observation concentrations. A second objective was to analyze the role location setting (urban, suburban, rural) played in the accuracy of interpolated ozone concentrations.

Data Used

The location setting analysis covered the spatial domain of the eastern U.S. Ozone concentration data were obtained from AIRS. Two distinct ozone episodes were chosen to test the interpolation based on urban, suburban, and nonrural site classification. The data used in this analysis was the 14:00 hour ozone concentration from a single day in each of the two episodes. Determination of the day used was made by analyzing "movies" of hourly ozone concentrations of each episode, namely July 1 to July 23, 1988 and July 20 to July 31, 1993. July 7, 1988 exhibits regionally high ozone concentrations extending from South Carolina to Wisconsin in the west and up to Vermont in the east. Both urban and rural areas show concentrations in exceedance of the ozone standard (120 ppb). July 23, 1993 is part of a more localized episode in the Southeast. The region from Georgia in the South to Indiana in the North and from Missouri in the west to North Carolina in the east have high concentrations but only local urban areas, such as Atlanta and Memphis show levels near or above the standard.

            
Figure 1. Contoured Maps of ozone episodes on July 7 , 1988 and July 23, 1993.

These two distinct concentration patterns provide test cases to statisitically measure the performance of spatial interpolation in estimating regional ozone concentrations as well as localized hot spots.

AIRS sites contain a Location Setting Code in the sites' location names. Three code types were used: (1) Urban and center city, (2) suburban and (3) rural. AIRS also contains some sites that were not classified. The locations of the classified sites for the 1993 episode are shown in Figure 2. The 1993 episode was defined by 102 urban, 259 suburban, 238 rural, and 13 nonclassified sites totalling 612 total sites. The 1988 episode had 73 urban, 218 suburban, 186 rural, and 13 nonclassified sites, 490 total.


Figure 2. Ozone monitoring sites in the eastern half of the AIRS network for July 1993.

Interpolation Settings

The settings of the Gridder in the location setting analysis were an inverse square distance weighing (1/r2), a maximum number of sites of 4, a minimum of 1, and a radius of influence of 0.1 radians (~500km). The radius of influence was not a factor in the interpolation since the maximum number of sites was always reached within a radius of less than 0.1 radians. The dependence on the maximum number of sites meant that the actual radius of influence varied; locations with a high station density had a small effective radius while areas with a low station density had a large effective radius.

Four interpolation runs were conducted for each data set (1988 and 1993). The first consisted of a random removal of 10% of the sites, independent of location setting. The other three runs consisted of an interpolation with the removal of only urban, only suburban, and only rural sites.

Results

The July 7, 1988 episode showed that when a random set of stations was removed from the initial data set, the interpolation process produced an R2 of about 0.85 (Figure Xxx). The interpolation performed least favorable when only urban sites were removed, having a R2 of 0.70. Removing the suburban sites had the least effect on the fit with an R2 of 0.86 while the rural site run produced an R2 of 0.81. The slope remained failry constant for all four runs while the offset (y-intercept) was lowest for the random and suburban sets and highest for the urban and rural sets.


Figure 4. Correlation plots and statistics for random, suburban, urban, and rural sites on 7/7/88.

The testing for July 23, 1993 data produced a similar correlation coefficient to the 1988 episode for the random removal (R2=0.83) but the classified location setting sets gave different results (Figure Xxx). The best interpolation performace was found in the rural classification with an R2 of 0.70, while the urban and suburban resulted in correlation coefficients of 0.66 and 0.60 respectively. The urban set produced the lowest slope of 0.76 and highest offset at 14.5


Figure 5. Correlation plots and statistics for random, suburban, urban, and rural sites on 7/23/93.

The randomly selected site test (independent of location setting) for July 23, 1993 was conducted three additional times to verify the high correlation it produced in the first run compared to the classified runs. It was thought that the outlying point in the random run (~150ppb, see Figure a) may have biased the correlation statistics. The three tests produced R2 between 0.78 and 0.86 and supported the R2 of 0.83 from the first run. The slopes showed a wide range, from 0.77 to 0.94.


Figure 6. Correlation plots for randomly selected sites on July 23, 1993.

The above correlation plots indicate that the spatial interpolation of ozone conentrations in the eastern U.S. results in concentrations that match observed concentrations fairly well. The interpolation testing method produced R2 values between 0.6 and 0.85. Due to the limited number of testing runs, it was difficult to determine whether the location setting of a site had any effect on reducing or increasing the incertainty in the interpolation.

Back to Contents


Number of Sites Analysis

An important factor in the spatial interpolation of pollutant concentrations is the number of sites used. In areas of high spatial density, it may be adequate to use the nearest site to the estimated location. Using a higher number of sites may be desirable in regions with sparce monitor site coverage. The objective of this analysis is to determine the most adequate number of stations to include in spatially interpolating ozone concentrations.

Data Used

The daily maximum ozone for the period 1991 to 1995 was calculated for the eastern U.S. using hourly AIRS ozone conentration data. The daily maximum values for the season June, July, and August were averaged over 1991 to 1995. The resulting five year seasonal average daily maximum ozone concentrations were used in the following analysis.

Interpolation Settings

The settings used in the evaluation of the number of sites were inverse square distance weighing (1/r2), a radius of influence of 0.1 radians, and a minimum number of sites set to 1. The maximum number of sites were varied with values of 1,2,3,4,5, and 10.

Results

Two separate randomly removed sets of 10% of the AIRS data were interpolated. Each set was interpolated six times using a maximum number of sites of 1,2,3,4,5, and 10, see Figures x and x.


Correlation plots from 10% removed AIRS set 1.


Correlation plots from 10% removed AIRS set 2.

The above sets indicate that the most accurate interpolation is achieved using a maximum number of sites of three. In both of the test sets, increasing the number of sites from one to two increased the correlation coefficient by the largest margin, from an average R2=0.297 to and average R2=0.349. The increase from two to three sites boosted the average R2 to 0.366. Using four sites slightly reduced the average R2 to 0.364. Including five sites produced an R2 only slightly lower at 0.360. Finally, increasing the sites to 10 resulted in a decrease of the average R2 (0.354) but in the first set using ten sites actually produced a higher R2 (0.415) than using five sites (R2=0.412).

The low R2 values were caused by large ozone concentration variation over a small spatial scale. The main outlyer in Figure x has an observed O3 concentration of about 70ppb while the interpolated value is about 35ppb. The 70ppb site which was removed from the intital AIRS data set was in an area of high station density. The reamining sites (those used in the interpolation calculation) in the immediate area surround the 70ppb site had concentrations considerably lower than 70ppb and as a result the interpolated concentration at that location was about half of the observed concentration.

It would be beneficial to do further interpolation test runs to solidify the jsut described results. The next section contains additional interpolation runs using a combination AIRS/CASTNet data set and will be compared to the above results.

Back to Contents


Additional Sites Analysis

The objective of this section was to evaluate the influence on interpolation performance by supplementing a data set with additional ozone monitoring stations. The Clean Air Status and Trends Network (CASTNet) was used to supplement the AIRS network. Spatial interpolation was conducted with the AIRS and CASTNet data set and compared to interpolation performance using only AIRS data.

Data Used

Seasonally (June,July, and August) averaged daily max ozone concentration of the CASTNet data were combined with the AIRS data described in the previous section. CASTNet added 75 regional monitoring stations to the 857 AIRS stations. The figure below shows the distribution of the AIRS (yellow boxes) and CASTNet (red boxes) ozone monitoring stations.


Ozone monitoring sites for AIRS (yellow boxes) and CASTNet (red boxes) network.

Interpolation Settings

The settings used in the evaluation of additional sites were inverse square distance weighing (1/r2), a radius of influence of 0.1 radians, and a minimum number of sites set to 1. The maximum number of sites were varied with values of 1,2,3,4,5, and 10 for some of the runs for purposes of verifying the results in the previous section. Subsequent runs used only three as the maximum number of sites.

Results

A single 10% removed set was obtained from the AIRS/CASTNet combined data. The remaining data was used in estimating concentrations at the removed locations (Figure ).


Correlation plots from 10% removed AIRS/CASTNET set.

The interpolation runs on the AIRS/CASTNet data followed the same trend regarding the number of sites used. The R2 peaked (0.724) when using three stations.

Ozone concentrations at the removed AIRS locations from the two sets used in the previous number of station analysis were estimated using the AIRS/CASTNet data set. The resulting correlation plots are shown below.


Correlation plots from 10% removed AIRS set 1.


Correlation plots from 10% removed AIRS set 2.

Figure x shows that using three sites from the AIRS/CASNet data set improves the R2 to 0.422, slightly above R2=0.419. Unlike the interpolation results with AIRS data only, the combined AIRS/CASTNet data displayed an R2 increase (R2=0.441) when four sites were used in the interpolation. In fact, after an R2 decrease when using five sites, including ten sites gave the highest R2 of 0.446.
The second removed AIRS set used with the AIRS/CASTNet data in Figure x also exhibits slight improvement over just the AIRS data, from R2=0.314 to R2=0.324. Again, using four sites increased the R2, to 0.335.
The above correlation plots along with those in the number of sites analysis indicate that using either three or four sites during interpolation produces the "best" ozone concentration estimates. Since there was very little overall difference between three or four sites, the following interpolation evaluation used only three sites.

One outstanding point in the above figures is the high R2 (0.7) produced in Figure x compared to the lower R2 values (~0.4) in figure x and x. The removed set in figure x contained eight CASTNet stations and 85 AIRS stations. The removed set in Figure x and x contained 85 AIRS stations. The removed AIRS/CASTNet set did not exhibit the effects of large ozone concentration variation over small spatial scales. The 85 AIRS sites in the AIRS/CASTNet removed station set were removed from both the initial AIRS ozone concentration data set and the initial AIRS/CASTNet data set and were estimated using the same interpolation mehtod as previously. The resulting scatterplots are shown in Figure xx).


Correlation plots from 10% removed AIRS/CASTNET location set. Interpolated with AIRS (a) and AIRS/CASTNet (b) data.

An additional 10% removed set was created from the AIRS data. Interpolation was conducted for both AIRS and AIRS/CASTNet ozone concentration data sets. The R2 using the AIRS data interpolation was about 0.6 while for the combined AIRS/CASTNet data it dropped to 0.51. The cause of this drop was the fact that a CASTNet site was located at nearly the same location as one of the removed sites. The CASTNet site had a concentration about 20 ppb higher than the removed AIRS station and caused the estimated concentration to be substantially higher than the observed concentration. The R2 for the AIRS/CASTNet run without that point improved to 0.572 while for the AIRS run it remained essentially unchanged (R2=0.594). It is interesting to note that the addition of the CASTNet data in this case showed a decrease in the accuracy of the ozone estimation, unlike the previous runs.


Correlation plots from 10% removed AIRS set 3. Interpolated with AIRS (a)and AIRS/CASTNet (b) data.

It was thought that possibly the 10% was too small of a subset. A larger set may not be influenced as heavily in its R2 by the few outlyers. A removed set was generated from 20%, rather than 10% of the AIRS. The results in Figure x show virtually identical R2 values for the AIRS and AIRS/CASTNet runs of 0.549 which are not an improvement over the R2 produced by the 10% removed sets.


Correlation plots from 20% removed AIRS set. Interpolated with AIRS and AIRS/CASTNet data.

Back to Contents


Conclusions

Three approaches were used in evaluating the spatial interpolation of ozone concentrations in the eastern U.S.
  1. The results from the location setting analysis indicate that the spatial interpolation procedure used performs adequately in estimating ozone concentrations in the eastern half of U.S. The regional ozone of 1988 was better estimated overall (R2 between 0.71 and 0.86) than the more textured pattern of 1993 (R2 between 0.60 and 0.83) where the interpolation underestimated the concentrations more often. It was difficult to determine whether the location setting of a site had any effect on reducing or increasing the uncertainty in the interpolation due to the limited number of test runs.

  2. The number of sites producing the "best" interpolation performance in estimating ozone conentrations occured when using three or four sites.

  3. The addition of the CASTNet ozone monitoring network to the AIRS network did not substantially improve the estimation of ozone concentrations. Depending on the interpooation run, it either slightly improved the R2, slightly lower it, or left it unchanged.

While this work provides a foundation for quantifying the uncertainty in distance weighted spatial interpolation of ozone concentrations, additional test runs (tens or hundreds of runs, rather than the 5-10 used in this work) are required to obtain solid statistical measures. The AIRS ozone monitoring network has high spatial station density in the eastern U.S. which hampered some of the interpolation testing. For instance, highly varying ozone concentrations in an urban area (a single site measuring a concentration of 70 ppb while other stations in its immediate area measured concentrations of approximately 20 ppb) biased the results. The spatially dense network also caused the addition of sites to have no distinguishable effect on the ozone interpolation performance.

Back to Contents


Appendix

Contour
	Functionality:	transform data from table format into grid format
	Input:	Table - a set of data points
		WeightFunc - distance weight function
		Radius - distance constraint
		MinPoints - minimum required number of data points within Radius
		MaxPoints - maximum number of data points used in calculation of a grid cell
	Output:	Grid - uniformly distributed set of data points
function Contour(out Grid, in Table, in WeightFunc, in Radius, in MinPoints, in MaxPoints)
{
	for each Cell in Grid
		CalcCell
}

CalcCell
	Functionality:	calculate value of a grid cell given a table of data points around it
function CalcCell(in out Cell, in Table, in WeightFunc, in Radius, in MinPoints, in MaxPoints)
{
	PointList = an empty list of table points
	for each point in Table that is not null
		if the distance between the point and the cell is less than Radius then
			add point to the PointList, sorted by distance
	Cell value = Interpolate PointList
}

Interpolate
	Functionality:	interpolate over a sorted list of data points
function Interpolate(in PointList, in WeightFunc, in MinPoints, in MaxPoints)
{
	if number of points in PointList < MinPoints return null value

	TotalWeight = 0
	Sum = 0
	WeightExp = 0 if WeightFunc = 1
		    1 if WeightFunc = 1/r
		    2 if WeightFunc = 1/r2, etc
	
	for first MaxPoints in PointList
		Dist = point distance from cell
		Weight = 1 / (Dist^WeightExp)
		TotalWeight = TotalWeight + Weight
		Sum = Sum + Weight * point value
		
	if TotalWeight = 0 return null value
	Sum = Sum / TotalWeight
	return Sum
}

Back to Contents



Submit your comments, feedback, questions, and ideas pertaining this page. Your input will be automatically added to the existing annotations. In order to add a new comment, you must be registered with the CAPITA People's Page.


support@capita.wustl.edu