NORTH AMERICAN INTEGRATED FINE PARTICLE DATA SET
Paper 99-398
Bret Schichtel
Center for Air Pollution Impact and Trend Analysis, Washington University, One Brookings Drive, Campus Box 1124, St Louis, Missouri 63130 bret@mecf.wustl.edu
Stefan R. Falke and Rudolf B. Husar
Center for Air Pollution Impact and Trend Analysis, Washington University, One Brookings Drive, Campus Box 1124, St Louis, Missouri 63130 bret@mecf.wustl.edu
ABSTRACT
Two long term North American fine particle (<2.5 micrometers) data sets were created by integrating data from 18 historical and active monitoring networks supplied by eight different organizations. One data set consists of PM2.5 mass and the other data set consists of PM2.5 mass and its elemental composition, organic and elemental carbon, ions, and light absorption for ~600 urban and rural monitoring sites in the US and Canada from 1979 through February 1997. Data processing involved reformatting the data into a common format and units with uniform geographic and temporal coding, and creating a consistent set of data flags. A consistent set of metadata describing the networks variables, sampling sites, samplers and analysis methods was also added. Subsequently, the data were merged into a single database. No modifications were made to the data values beyond unit conversions. Data used in the integrated data sets came from the following networks: IMPROVE, NESCAUM, GAViM, SCENES, CARB, NAPS, AIRS, MOHAVE, PREVENT, WHITEX, Nation Park Services SFU, CASTNet, National PM Research Monitoring Network, Tennessee Valley Authority, and two specialty studies in Philadelphia. The integrated data sets are made publicly available via the world wide web.
INTRODUCTION
The promulgation of the new PM2.5 air quality standard requires the measurement of ambient PM2.5 mass and its constituents for compliance testing, determining source attribution, model evaluation, and air quality tracking and evaluation. The national PM2.5 network started to collect data in January 1999 and will require another three years of monitoring before compliance testing of the annual standard can begin. However, there are a number of historical and currently active PM monitoring programs with multiple years of data that can be drawn upon to obtain a better understanding of the PM2.5 concentrations and their causes and trends. Many of these networks have been drawn together and integrated into a single North American Fine Particle Database.
The creation of the integrated database involved gathering the data from various data suppliers, reformatting the data to a standard format and adding location, variable and data sampling metadata. Many of the data sets grouped together collected particulate samples using different samplers, sampling duration, sample analysis techniques and some of the networks corrected the data to a reference temperature and pressure while other did not. No modifications were made to the data to account for these differing sampling and reporting techniques.
DATABASE DESCRIPTION AND DATA AVAILABILITY
The North American Integrated PM fine database contains approximately 600 locations with some data spanning nearly 20 years (1979 1998) (Figure 1). Canada has 60 of the stations with the remaining stations in the US. The database was created by integrating 18 data sets supplied by eight different organizations (Table 1). Each network contained one or more of the following variables, particulate mass, elemental composition, organic and elemental carbon, ions and absorption. Two data sets were created from the database and are available on the world wide web:
North American Integrated PM2.5 Data Set, located at: http://capita.wustl.edu/datawarehouse/Datasets/CAPITA/NAMPM_25/Data/NAMPM25.html. This data set contains only PM2.5 mass and data quality flags from all monitoring networks.
North American Integrated PM2.5 Speciated Data Set, located at: http://capita.wustl.edu/datawarehouse/Datasets/CAPITA/NAMPM_fine/Data/NAMPM_f.html. This data set contains the PM2.5 mass and its elemental composition, organic and elemental carbon, and ion concentrations and associated quality flags from all monitoring networks.
The data formats for these data sets are described in the Appendix.
DATA SUPPLIERS AND NETWORKS
This section describes the content of each network and the networks custodians or data supplier. Also, any special data processing that was required to integrate a network's data with the other data sets is described.
IMPROVE
The Interagency Monitoring of Protected Visual Environments (IMPROVE)1 network was established to protect visibility at Class I areas. The IMPROVE steering committee is composed of representatives from the National Park Service (NPS), the Forest Service (USFS), the Bureau of Land Management, the Fish and Wildlife Service (FWS), the Environmental Protection Agency, and regional-state organizations.
The IMPROVE fine particle network collects PM2.5 and PM10 samples over a twenty four hour period every Monday and Friday using IMPROVE samplers. The network consists of 93 monitoring sites, located in rural areas (Figure 2), operating between 3/88 to present. The PM samples are analyzed for PM2.5 mass and its elemental constituents, organics, ions, light absorption and PM10 mass. The data set contains the concentrations, minimum detection limit, error, and data quality flag. The data were downloaded from the University of California Davis' ftp site: caesar.ucdavis.edu
NESCAUM
The NESCAUM (Northeast States for Coordinated Air Use Management) fine mass network was an extension of the IMPROVE network supported by NESCAUM and the National Park Service. The network consisted of 10 IMPROVE type samplers located in rural areas of the northeastern US (Figure 2), and collected three 24 hour samples per week.2 The PM samples were analyzed for PM2.5 mass and its elemental constituents, organics, ions, and light absorption. The data set contains the concentrations, minimum detection limit, error, and data quality flags. The data were downloaded from the University of California Davis ftp site: caesar.ucdavis.edu
GAViM
Guelph Aerosol and Visibility Monitoring program (GAViM) is run by the Guelph Scanning Proton Microprobe (GSPM) laboratory. The network consists of four Canadian monitoring sites (Figure 2) using IMPROVE samplers and analytical protocols. The network collects 24 hour samples every Wednesday and Saturday, and analyzes them for PM2.5 mass and its elemental constituents, and light absorption. The network has been operational since 6/94. The data set contains the concentrations, minimum detection limit, error, and data quality flags. The data were downloaded from the GSPM laboratory web site: http://www.physics.uoguelph.ca/PIXE/airq/airq.html
NPS -SFU
The National Park Services Stack Filter Units (NPS-SFU) network consisted of 80 monitoring sites which collected particulate samples in rural regions throughout the United States (Figure 3). The network operated from 7/79 to 11/93 with monitoring sites coming on and off line throughout this time period. The network used two stage stacked filter samplers collecting fine (< 2.5 mm) and coarse (> 2.5 mm) particulate samples over a 72 hour sampling period from 7/79 -5/86 and 24 hour sampling period from 6/86 - 11/93. The samples were analyzed for PM2.5 mass and its elemental constituents and light absorption and PM coarse mass and its elemental constituents. The data set contains the concentrations, minimum detection limit, error, and data quality flag. The data were obtained from the National Park Service.
MOHAVE
The project MOHAVE (Measurement of Haze and Visual Effects) was established to determine what contributions the Mohave Power Plant and other sources make to haze at the Grand Canyon National Park and other mandatory Class I areas. The MOHAVE network employed 43 IMPROVE type samplers in the Southwest (Figure 3) collecting daily particulate samples over a 24 hour sampling period. Several sites collected two 12 hour samples a day. The network collected data over a winter and summer period from 1/10/2/15/92 and 7/11/9/2/92 respectively. The particulate samples were analyzed for PM2.5 and its elemental constituents, organics, ions, light absorption and PM10. The data set contains the concentrations, minimum detection limit, error, and quality flags. The data were obtained from the National Park Service.
All sites which collected two 12 hour samples per day were aggregated to 24 hour samples prior to integration with other data.
PREVENT
The Pacific Northwest Regional Visibility Experiment Using Natural Tracers (PREVENT) network was established to study visibility causes and effects in Washington state, west of the Cascades. The network consisted of 34 monitors located in Washington and Oregon (Figure 3). Daily particulate samples were collected from 6/909/90 and analyzed for PM2.5 mass and its elemental constituents and light absorption. The data set contains the concentrations and error. The data were obtained from the National Park Service.
WHITEX
The Winter Haze Intensive Tracer Experiment (WHITEX) was established to study the visibility impacts of emissions from the Navajo Generating Station. The database contained data from 13 locations which sampled from 1/1/87 2/18/87 (Figure 3). A number of different samplers were employed at each location, including IMPROVE, stack filter unit, dichotomous samplers, and SCISAS. Samples were collected every 6 hours, 12 hours, and 24 hours depending on the site and sampler. The particulate samples were analyzed for PM2.5 mass and its elemental constituents, organics, ions, and light absorption. The data were obtained from the National Park Service.
Only data from one sampler per monitoring site were extracted from the database and integrated with data from the other data sets. Nine sites used IMPROVE samplers three sites use stack filter units and one site used the SCISAS sampler. All data were aggregated to 24 hour samples.
NAPS
The National Air Pollution Surveillance (NAPS) Network was established to monitor and assess the air quality in Canadian urban regions. Fine (< 2.5 mm) and coarse (> 2.5 mm and < 10 mm) particulate data from 29 sites operating for some time between 1/90 to 12/96 were available (Figure 4). The data were collected over 24 hour periods every 6th day. The samples were analyzed for fine and coarse mass, their elemental constituents and ions. The data set contains the concentrations, and data quality flag. The data were obtained from Environment Canada, http://www.etcentre.org/NAPS/NAPS_main_page.html
CARB
The California Air Resource Board (CARB) collects fine (< 2.5 mm) and coarse (> 2.5 mm and (< 10 mm) particulate samples at 26 monitoring sites throughout California from 1/89 - to the present (Figure 4). The particulate samples are collected over 24 hour periods every 6th day using dichotomous samplers. The samples are analyzed for fine and coarse mass and their elemental composition. Only concentration values are available. The data were obtained from CARB: http://www.arb.ca.gov/aqd/aqd.htm.
AIRS
The Aerometric Information Retrieval System (AIRS) network consists of 119 PM2.5 monitoring sites which collected 24 hour samples every 6th day from 1/85 - 12/97 (Figure 4). The monitoring sites are located throughout the US mostly in and around urban and industrial regions. The AIRS PM2.5 data were obtained from the AIRS database at EPA.
CASTNet - Visibility Chemistry
The purpose of the Clean Air Status and Trends Network (CASTNet) Visibility Chemistry network is to measure visibility and related parameters defining status and trends. The network consists of 12 monitoring sites located in rural areas of the Eastern US, which collected data for some time between 10/93 and the present (Figure 5). Three stage filter packs were used to collect 24 hour particulate samples every 6th day. The particulate samples are analyzed for PM2.5 and its elemental constituents, organics, ions, and light absorption. The data set contains the concentrations, and data quality flag. The data were obtained from the USEPA.
CASTNet - Dry Deposition
The Clean Air Status and Trends Network (CASTNet) Dry Deposition Network measures fine (<2.5 mm) ions at 96 sites between 1/87 and the present (Figure 5). Three stage filter packs are used to collect weekly particulate samples. The data set contains the concentrations, and data quality flag. The data were obtained from the USEPA.
National PM Research Monitoring Network
The National PM Research Monitoring Network was established with the primary objective of providing ambient air quality data for relating health effects to chemical and/or physical properties of PM and to support emerging regulatory implementation and development issues. This network began collecting fine and coarse speciated PM data and meteorological data in Phoenix AZ in February of 1995. Monitoring platforms at Baltimore, MD and Fresno, CA were added in 1997. The monitoring platforms had a dichotomous sampler collecting fine (<2.5 mm) and coarse (>2.5 mm and < 10 mm) 24 hour integrated particulate samples every 3 days and a dual fine particle sequential sampler (DFPSS) collecting fine 24 hour integrated particulate samples every day. The fine particulate samples are analyzed for PM2.5 and its elemental constituents, and organics. The coarse particulate samples are analyzed for mass and elemental constituents. The data set contains the concentrations and error. The data were obtained from the USEPA.
Only the Baltimore and Phoenix data through 1997 (Figure 5) were available for integration with the other data networks.
Philadelphia 1992- 1995 Study
The Philadelphia 1992- 1995 Study measured PM2.5 and PM10 at one monitoring sites throughout the Philadelphia metropolitan area from 5/92 4/95 (Figure 5). The network collected 24 hour samples every day. The data were obtained from the USEPA.
Philadelphia Saturation Study
The Philadelphia Saturation Study measured PM2.5 and PM10 at sixteen monitoring sites in Philadelphia from 9/11/94 10/9/94 (Figure 5)3. The network collected 24 hour sample every other day. The data were obtained from the USEPA.
SCENES
The Subregional Cooperative Electric Utility, Department of Defense, National Park Service, and EPA study (SCENES) was a long-term observational study conducted by several industry and government groups to understand the factors influencing atmospheric visibility in the southwestern United States.
The SCENES network collected fine (< 2.5 mm) and total (< 15 mm) particulate samples at seven sites from 11/84 10/89 (Figure 6). Particulate samples were collected every third day using WRAQS-2 and SCISAS samplers running over 8, 12, 16, and 24 hour periods, depending on the location and year. The particulate samples were analyzed for PM2.5 and its elemental constituents, organics, ions and light absorption. The data set contains the concentrations, minimum detection limit, error, and data quality flag. The data can be obtained from the Electric Power Research Institute (EPRI) at: http://src.com/~epriasdc/index.htm.
The SCENES data incorporated into the integrated data sets came from Vasconcelos4. If a 24 hour sample was not available for a given location and time, it was created by aggregating the two twelve or four eight hour samples together. Only the concentration values were available for inclusion in the integrated data sets.
EMEFS
The purpose of the Eulerian Model Evaluation and Field Study (EMEFS) network was to evaluate comprehensive regional Eulerian acid deposition models from US and Canada. The EMEFS network is a composite of the following networks: APIOS (OME); CAPMon (AES); FADMP (FCG); MODES (TVA); MODES-GRAD (EPA); MODES- VAR (EPA); NDDN (EPA) OEN (EPRI). The EMEFS data set consists of data from 129 stations over the eastern US and Ontario, Canada from 6/88 - 5/90 (Figure 6). The particulate data were collected over 24 hour periods using filter pack techniques, and analyzed for ions. The data set contains the concentrations, minimum detection limit, and data quality flag. The data were obtained from the Electric Power Research Institute (EPRI) at: http://src.com/~epriasdc/index.htm.
TVA
The Tennessee Valley Authority (TVA) network consists of 9 monitoring sites in Tennessee and surrounding states (Figure 6). PM2.5 and PM10 samples were collected every 6th day using dichotomous samplers from 5/80 9/87. Only concentration values were available. The data were obtained from the Tennessee Valley Authority.
DATA PROCESSING AND QUALITY CONTROL
In order to merge the data from the different networks the data were passed through a set of standardization routines homogenizing the data formats and metadata. The standardization process included:
No quality control of the data beyond that done by the supplying organization has been performed on the data sets. However, as the data are used and problems identified appropriate procedures to remedy the problems in the data sets will be conducted.
DISCUSSION
The integration process has completed the first steps of homogenization, description and finally integration of the data from the multiple networks. This integration process has grouped together data collected using different samplers, sampling duration, sample analysis techniques and some of the networks corrected the data to a reference temperature and pressure while other did not. The next step in the integration process will be to assess the impact of these network variations on the PM concentrations and possibly adjust some the data to account for these network variations. However, the North American Fine Particle Database is still a rich resource for studying and addressing fine particulate issues. These data have already become the foundation of several analyses that are available on EPA's PM2.5 Analysis Workbook - Virtual Workgroup Web site at http://capita.wustl.edu/databases/userdomains/pmfine/.
ACKNOWLEDGMENTS
This project is supported by EPAs Office of Air Quality Planning and Standards (OAQPS). The authors would like to thank all of the data suppliers who help in getting us the data and providing us with assistance in translating and describing the data sets.
1. Sisler, J.F.; Huffman, D.; Latimer, D.A.; Malm, W.C; Pitchford, M. Report #ISSN No. 0737-5352-26 CIRA, CSU, Fort Collins, CO., 1993.
5. Husar, R.B., Frank, N.H. Interactive Exploration and Analysis of EPA's Aerometric Information and Retrieval System (AIRS) Data Sets. Air & Waste Management Association 84th Annual Meeting, June 16-21, Vancouver, BC., 1991.
6. NAtChem. The National Atmospheric Chemistry Database For Particles and Related Trace Gases / Toxics website: http://airquality.tor.ec.gc.ca/natchem/particles/
APPENDIX
Data File Format
The data set consists of a main data table containing the fine mass concentration and flag values, and location and variable tables which describe the monitoring sites and variables. The data are available in three file formats: Fixed Length ASCII, Voyager5, and Microsoft Access
Data Table
A sample Data table is presented in Table A-1. The first two columns, Loc_Code and Date are the Key or dimensional fields identifying the monitor and sample date and time for each data record respectively. Additional information about the monitoring site is contained in the Location table. Each of the remaining columns contain the data for a single variable. The variable code (Var_Code ) is used for the column name, additional variable metadata is located in the Variable Table.
Location Table
The Location table is made up of the location code (Loc_Code) followed by the location name, Longitude and Latitude (Table A-2). Each station is assigned a unique Loc_Code based upon the "Station ID". The Station ID format is based on the format used in the Canadian's NAtChem Particle database6. Encoded in the location code is the network as well as location information (see Table A-3). The Loc_Code for monitoring sites with AIRS site codes use characters 1-4 for the Network abbreviation and 5-13 for the AIRS site code. The Location tables StationID field contains the original location identifier that comes with the database. The Location Name is made up of the Location Code, the station name, elevation and sampling starting date.
Variable Table
The Variable table is made up of the variable abbreviation (Var_Abbr), variable descriptive information (Table A-4). Each variable is assigned a unique Var_Abbr based upon the species, attribute (concentration or flag) and sampling cut point. Table A-5 lists the meaning of each character in the Var_Abbr.
TABLES
Table 1. Data sets processed.
|
Network |
Network Abbr. |
Data Source |
# Sites |
Time Span |
PM Variables |
|
Interagency Monitoring of Protected Visual Environments (IMPROVE) |
IMPR |
University of California Davis |
93 |
3/88 2/98 |
PM2.5, elemental composition, organics, ions, bab; PM10 |
|
Northeast States for Coordinated Air Use Management (NESCAUM) |
NESC |
University of California Davis |
11 |
9/88 11/93 |
PM2.5, elemental composition, organics, ions, bab |
|
Guelph Aerosol and Visibility Monitoring program (GAViM) |
GAVM |
University of Guelph, Ontario |
4 |
6/94 12/97 |
PM2.5, elemental composition, bab |
|
National Park Services Stack Filter Units (SFU) |
SFU |
National Park Service |
80 |
7/79 11/93 |
PM2.5, elemental composition, bab; and Coarse PM mass and elemental composition |
|
Measurement of Haze and Visual Effects (MOHAVE) |
MOHA |
National Park Service |
43 |
1/10/2/15/92 |
PM2.5, elemental composition, organics, ions, bab; PM10 |
|
Pacific Northwest Regional Visibility Experiment Using Natural Tracers (PREVENT) |
PREV |
National Park Service |
34 |
6/909/90 |
PM2.5, elemental composition, bab |
|
Winter Haze Intensive Tracer Experiment (WHITEX) |
WHIT |
National Park Service |
13 |
1/1/87 2/18/87 |
PM2.5, elemental composition, organics, ions, bab |
|
National Air Pollution Surveillance Network (NAPS) |
NAPS |
Environment Canada |
29 |
1/90 12/96 |
PM2.5, elemental composition, organics, and coarse PM mass and elemental composition |
|
California Air Resource Board (CARB) |
CARB |
California Air Resource Board |
26 |
1/89 8/97 |
PM2.5, elemental composition, and coarse PM mass and elemental composition |
|
Aerometric Information Retrieval System (AIRS) |
AIRS |
EPA |
119 |
1/85 12/97 |
PM2.5 |
|
Clean Air Status and Trends Network (CASTNet) Visibility Chemistry |
CAST |
EPA - National Exposure Research Lab (NERL) |
12 |
10/93 12/97 |
PM2.5, elemental composition, organics, ions, bab |
|
Clean Air Status and Trends Network (CASTNet) Dry Deposition |
CAST |
EPA - National Exposure Research Lab (NERL) |
96 |
1/87 12/97 |
Ions |
|
National PM Research Monitoring Network |
NPMR |
EPA - National Exposure Research Lab (NERL) |
2 |
3/95 12/97 |
PM2.5, elemental composition, organics and coarse PM mass and elemental composition |
|
Philadelphia Saturation Study |
PHLS |
EPA |
16 |
9/11/94 10/9/94 |
PM2.5, PM10 |
|
Philadelphia 1992- 1995 Study |
PHCO |
EPA |
1 |
12/5/92 12/4/95 |
PM2.5, coarse PM mass, total PM mass |
|
Subregional Cooperative Electric Utility, Department of Defense, National Park Service, and EPA study (SCENES) |
SCEN |
EPRI |
7 |
11/84 10/89 |
PM2.5, elemental composition, organics, bab |
|
Eulerian Model Evaluation and Field Study (EMEFS) |
EMEF |
EPRI |
129 |
6/88 5/90 |
Ions |
|
Tennessee Valley Authority (TVA) |
TVA1 |
Tennessee Valley Authority |
9 |
5/80 9/87 |
PM2.5; PM10 |
Table A-1. A sample Data Table.
|
Loc_Code |
Date |
MF_cf |
MF_ff |
|
NAPSCANS1HAL |
1/5/90 |
14.8 |
0 |
|
NAPSCANS1HAL |
1/11/90 |
13.1 |
0 |
|
NAPSCANS1HAL |
1/17/90 |
21.3 |
0 |
|
NAPSCANS1HAL |
1/23/90 |
17.5 |
2 |
|
NAPSCANS1HAL |
1/29/90 |
9.1 |
0 |
Table A-2. A sample Location Table.
|
Loc_Code |
Loc_Name |
Loc_Lon |
Loc_Lat |
|
IMPRUSUT1ARC |
IMPRUSUT1ARC__Arches National Park;_Devils Garden Campgr_1722_03/02/1988 |
-1.9127980 |
0.67681439 |
|
IMPRUSSD1BAD |
IMPRUSSD1BAD__Badlands National Park;_Park Headquarters___760_03/02/1988 |
-1.7791979 |
0.76346988 |
|
IMPRUSNM1BAN |
IMPRUSNM1BAN__Bandelier National Monument;_Fire tower____2000_03/02/1988 |
-1.8546591 |
0.62464351 |
|
IMPRUSTX1BIB |
IMPRUSTX1BIB__Big Bend National Park;_3 miles SE of Pant_1067_03/02/1988 |
-1.8009951 |
0.51186132 |
|
IMPRUSCA1BLI |
IMPRUSCA1BLI__Bliss State Park(TRPA);_1/4 mile beyond he_2043_11/17/1990 |
-2.0961399 |
0.68038739 |
Table A-3. The Loc_Code encoding definitions.
|
Characters |
Definition |
|
1-4 |
Network Abbreviation |
|
5-6 |
Country |
|
7-8 |
State/Province |
|
9 |
Monitor number |
|
10-12 |
Location Name Abbreviation |
Table A-4. A sample Variable Table.
|
Var_Abbr |
Var_Desc |
Units |
Species_Abbr |
Species_Name |
Attribute |
Cut Point |
|
MF_cf |
Fine Mass Concentration |
ug/m3 |
MF |
Fine Mass |
Concentration |
<2.5 um |
|
MF_ff |
Fine Mass Flag |
ug/m3 |
MF |
Fine Mass |
Flag |
<2.5 um |
Table A-5. The Var_Abbr encoding definitions. The meaning of the abbreviations and codes are listed in the Variable Table.
|
Characters |
Definition |
|
1-3 |
Species Abbreviation |
|
4 |
Attribute |
|
5 |
Cut Point |
FIGURES
Figure 1. Monitoring site locations and time trends for the North American Fine Particle Database.

Figure 2. Monitoring site locations and time trends for the IMROVE, NESCAUM, and GAViM particulate networks

Figure 3. Monitoring site locations and time trends for the NPS-SFU, MOHAVE, PREVENT, and WHITEX particulate networks
Figure 4. Monitoring site locations and time trends for the NAPS, CARB, and AIRS particulate networks

Figure 5. Monitoring site locations and time series for the CASTNet -Dry Deposition, CASTNet Visibility Chemistry, Philadelphia saturation Study, Philadelphia 1992 - 1995 study and National PM Research Monitoring networks

Figure 6. Monitoring site locations and time series for the SCENES, EMEFS, and TVA particulate networks.
