North American Integrated Fine Particle Data Sets

Bret Schichtel, Stefan Falke, and Rudolf Husar, Center for Air Pollution Impact and Trend Analysis (CAPITA), 7/23/98

Long term North American PM data sets representative of urban and rural air quality are being created by integrating data from research and routine fine particle monitoring networks that are or have been in operation. The data sets will be composed of fine, coarse, and PM10 mass, elemental composition, organics and ions. In addition, visibility and extinction and scatting coefficinets will be included.

The process for creating the data sets involves gathering the data from various data suppliers, reformatting the data to a standard format and adding location, variable and data sampling metadata. The data and metadata will be available for each individual network including all locations and variables, as well as integrated datasets where select variables are grouped into one data set from select networks.

This is an on going project with intermediate data sets periodically created. This report describes the currently available data sets networks and providers, the processing of the data for integration, and format of the final data sets.

This project is being supported by EPA’s Office of Air Quality Planning and Standards (OAQPS).

Currently Available Integrated Data Sets

Data Set: NAMPM_F
Data URL: http://capita.wustl.edu/datawarehouse/Datasets/CAPITA/NAMPM_fine/Data/NAMPM_f.html
Networks: IMPROVE, NESCAUM, GAViM, SCENES, CARB, NPMR, and NAPS
Variables: Fine mass, elemental composition including organics, and ions
#Stations: 172
Time Span: 3/88 – 12/97

Note: The elements included in this data set are only those that are include in the IMPROVE data. However, the organic variables from all data sets are included.

Data Set: NAMPM25
Data URL: http://capita.wustl.edu/datawarehouse/Datasets/CAPITA/NAMPM_fine/Data/NAMPM25.html
Networks: IMPROVE, NESCAUM, GAViM, SCENES, CARB, NPMR, NAPS, PHLS, PHCO, and AIRS
Variables: Fine mass (PM2.5)
#Stations: 318
Time Span: 3/88 – 12/97

These data are provided as was received from their sources. No quality control of the data, beyond that done by the supplying organizations, has been performed on the data. Some networks corrected their data to a reference temperature and pressure, others do not. No attempt has yet been made to reconcile these differences.

Data Networks and Suppliers

The data networks and their suppliers to be included in the North American integrated data sets are listed in tables 1-3:

 

Table 1. Data sets processed.

Network

Network Abbr.

Supplying Organization

# Sites

Time Span

PM Variables

Interagency Monitoring of Protected Visual Environments (IMPROVE)

IMPR

National Park Service

93

3/88 – 2/98

fine PM mass, elements, organics, ions, bab; PM10

Measurement of Haze and Visual Effects (MOHAVE)

MOHA

National Park Service

43

1/10/–2/15/92
7/11/–9/2/92

fine PM mass, elements, organics, ions, bab; PM10

Pacific Northwest Regional Visibility Experiment Using Natural Tracers (PREVENT)

PREV

National Park Service

34

6/90–9/90

fine PM mass, elements, organics, ions, bab; PM10

Stack Filter Units (SFU)

SFU

National Park Service

80

7/79 – 11/93

fine PM mass, elements, organics, ions, bab

Winter Haze Intensive Tracer Experiment (WHITEX)

WHIT

National Park Service

14

1/1/87 – 2/18/87

fine PM mass, elements, organics, ions, bab

Northeast States for Coordinated Air Use Management (NESCAUM)

NESC

NESCAUM

11

9/88 – 11/93

fine PM mass, elements, organics, ions, bab

Guelph Aerosol and Visibility Monitoring program (GAViM)

GAVM

University of Guelph, Ontario

4

6/94 – 12/97

fine PM mass, elements, bab

National PM Research Monitoring Network

NPMR

EPA

2

3/95 – 12/97

fine and coarse PM mass, elements, and organics

Aerometric Information Retrieval System (AIRS)

AIRS

EPA

~1500

1/85 – 12/97

PM2.5, PM10, TSP

Philadelphia Saturation Study

PHLS

EPA

16

9/11/94 – 9/10/94

PM2.5, PM10

Philadelphia 1979-1983 Study.

PHCO

EPA

9

4/24/79 – 12/26/83

PM1.5 fine, coarse, total, PM10, coarse > 10 um, Sulfate, Nitrate, Lead

Philadelphia 1992- 1995 Study

PHCO

EPA

1

12/5/92 – 12/4/95

PM2.5 fine, coarse, total

Clean Air Status and Trends Network (CASTNet) Visibilility Chemistry

CAST

EPA

12

10/93 – 12/97

fine PM mass, elements, organics, ions, bab

Clean Air Status and Trends Network (CASTNet) Dry Deposition Chemistry

CAST

EPA

94

1/87 – 12/97

fine ions

Subregional Cooperative Electric Utility, Department of Defense, National Park Service, and EPA study (SCENES)

SCEN

EPRI

7

11/84 – 10/89

fine PM mass, elements, organics, and bab

Eulerian Model Evaluation Field Study (US and Canada) (EMEFS)

EMEF

EPRI

129

6/88 – 6/90

fine ions

California Air Resource Board (CARB)

CARB

California Air Resource Board

26

1/89 – 8/97

fine and coarse mass, and elements

National Air Pollution Surveillance Network (NAPS)

NAPS

Environment Canada

29

1/90 – 12/96

fine and coarse PM mass, elements, and ions

Tennessee Valley Authority (TVA)

TVA1

Tennessee Valley Authority

9

5/80 – 9/87

fine mass; PM10

 

Table 2. Data sets to be processed.

Network

Network Abbr.

Supplying Organization

# Sites

Time Span

Variables

Eastern Regional Air Quality Study (ERAQS)

ERAQ

EPRI

9

11/78-3/80

total mass, ions, organics, and bab

 

Table 3. Data sets on order.

Network

Network Abbr.

Supplying Organization

# Sites

Time Span

Variables

SEAVS

SEAV

National Park Service

??

 

fine mass +

Mexican –Taxes Border Study

MTBS

National Park Service

18

9/9/96-10/13/96

fine mass, elements, ++

Sulfate Regional Experiment (SURE)

SURE

EPRI

56

1977 - 1978

fine mass, ions, organic carbon

Canadian Air and Precipitation Monitoring Network (CAPMoN)

CAPM

NATChem Particle

10

1983 - 1997

fine mass +

Canadian Acid Aerosols Monitoring Program (CAAMP)

CAAM

NATChem Particle

??

5/92 – 3/96

fine mass, PM10

New Brunswick Precipitation (and Air) Monitoring Network (NBPMN)

NBPM

NATChem Particle

11

1980 - 1997

TSP, PM2.5, PM10 trace metals

Deseart Reseach Institue

DRI

DRI

 

 

 

 

Data Processing and Format

A relational database for each network’s data is created consisting of a main Data table containing all of the data, a Location table containing the location metadata, and a Variable table containing the variable metadata. The data are then passed through a set of standardization routines that creat a data set ready for integration with all other processed data.

The standardization process includes:

0 - Valid values

1 - Data below the instruments minimum detection limit

2 - Questionable data, i.e. data values flagged due to non standard sampling, potential contamination, etc.

3 – Invalid sample

NULL – Flag did not exist

Data Table

A sample Data table is presented in Table 2. The first two columns, Loc_Code and Date are the Key or dimensional fields identifying the monitor and sample date and time for each data record respectively. Additional information about the monitoring site is contained in the Location table. Each of the remaining columns contain the data for a single variable. The variable code (Var_Code ) is used for the column name, additional variable metadata is located in the Variable Table.

Table 4. A sample Data Table.

(------------Key Fields----------------) (----------- ---------------Variable Fields----------------------------)

Loc_Code

Date

MF_cf111

Al_cf112

Si_cf112

P__cf112

S__cf112

NAPSCANS1HAL

1/5/90

14800

15.3

0.45

28.6

1830

NAPSCANS1HAL

1/11/90

13100

19.2

25.4

45.3

2230

NAPSCANS1HAL

1/17/90

21300

1

37.2

66.3

3660

NAPSCANS1HAL

1/23/90

17500

1

19.3

58.8

3420

NAPSCANS1HAL

1/29/90

9100

12.8

35.9

18.7

1400

Location Table

The Location table is made up of the location code (Loc_Code) followed by station name and location information (Table 5). Each station is assigned a unique Loc_Code based upon the "Station ID" format used in the Canadian NAtChem – Particle database. Encoded in the location code is the network as well as location information (see Table 6). The Loc-Code for monitoring sites with AIRS site codes use characters 1-4 for the Network abbreviation and 5-13 for the AIRS site code. The Location table’s StationID field contains the original location identifier that comes with the database.

Table 5. A sample Location Table.

Loc_Code

StationID

Land Use

Network

Country

state/province

City

Address

Station Name

Loc_Lon

Loc_Lat

Elevation

NESCUSNY1WHM

WHMO1

NESC

United States

New York

Whiteface Mt., NY

-1.28893

0.774635

639.94

NESCUSCT1MOM

MOMO1

NESC

United States

Connecticut

Mohawk Mt., CT

-1.27933

0.730129

459.84

NESCUSVT1PMR

PMRF1

NESC

United States

Vermont

Proctor Maple R. F. Underhill 1

-1.27176

0.777253

396.16

NESCUSVT2PMR

PMRF2

NESC

United States

Vermont

Proctor Maple R. F. Underhill 2

-1.27176

0.777253

396.16

NESCUSMA1QUR

QURE1

NESC

United States

Massachusetts

Quabbin Summit, MA

-1.26245

0.738274

310.83

 

Table 6. The Loc_Code encoding definitions.

Characters

Definition

1-4

Network Abbreviation

5-6

Country

7-8

State/Province

9

Monitor number

10-12

Location Name Abbreviation

 

Variable Table

The Variable table is made up of the variable abbreviation (Var_Abbr) followed by the variable name and sampler information (Table 7). Each variable is assigned a unique Var_Abbr. Encoded in the Var_Abbr is the species type and sampler and analysis methods. Table 8 lists the meaning of each character in the Var_Abbr.

Table 7. A sample Variable Table.

Var_Abbr

Var_Desc

Units

Species_Abbr

Species_Name

Attribute

Cut Point

Temp
Pressure

Sampler

Filter

Analysis Method

MF_cf111

Fine Mass

ng/m3

MF

Fine Mass

Concentration

<2.5 um

dichotomous

Teflon

Gravimetric

MF_ef111

Fine Mass

ng/m3

MF

Fine Mass

Error

<2.5 um

dichotomous

Teflon

Gravimetric

MF_mf111

Fine Mass

ng/m3

MF

Fine Mass

Minimum Detection Limit

<2.5 um

dichotomous

Teflon

Gravimetric

MF_ff111

Fine Mass

ng/m3

MF

Fine Mass

Flag

<2.5 um

dichotomous

Teflon

Gravimetric

NA_cf112

Sodium

ng/m3

NA

Sodium

Concentration

<2.5 um

dichotomous

Teflon

PIXE or XRF

NA_ef112

Sodium

ng/m3

NA

Sodium

Error

<2.5 um

dichotomous

Teflon

PIXE or XRF

NA_mf112

Sodium

ng/m3

NA

Sodium

Minimum Detection Limit

<2.5 um

dichotomous

Teflon

PIXE or XRF

NA_ff112

Sodium

ng/m3

NA

Sodium

Flag

<2.5 um

dichotomous

Teflon

PIXE or XRF

 

Table 8. The Var_Abbr encoding definitions. The meaning of the abbreviations and codes are listed in the Variable Table.

Characters

Definition

1-3

Species Abbreviation

4

Attribute

5

Cut Point

6

Sampler Type

7

Filter

8

Analysis Method

 

Data Quality Control

No quality control of the data beyond that done by the supplying organization has been performed on the data sets. However, as the data are used and problems identified appropriate procedures to remedy the problems in the data sets will be conducted.

Data File Formats

The data are provided in three file formats:

Associated with each fixed length ascii file is a data dictionary which gives the field names, the number of bytes and NULL value. The order of the fields in the dictionary from top to bottom is the order of the fields in the ASCII files from left to right.