Chapter 4
Inversion of the Source Receptor Relationship
This chapter presents the underlying theory for the inversion of integral equations and their discrete counter parts. It will then be shown how singular value decomposition of a matrix can be used to invert a system of linear equations. This solution technique will then be applied to the inversion of the source receptor relationship. The sensitivity of the retrieval of emission to changes in the spatial and temporal dimensions of the input data will be explored. Finally, the inversion algorithm will be used to retrieve SO2, NH4+, and NOX nitrogen emission fields.
A common problem arising in science is the need to determine the true input data given the observed responses of a system. This is an inversion process, whereby indirect measurements are used to understand and reconstruct unknown features of a system. For a linear system of continuous functions this can be described mathematically as (Backus and Gilbert, 1968; Parker, 1977; Allison, 1979; Tarantola, 1987)
g(t) = òX K(X,t) f(X,t) dX (4-1)
where g (X,t) A known response function
K(X,t) The known kernel of the system.
f(X,t) The unknown function to be solved for
X, t Dimensions of the system
Equations of this form are known as inhomogeneous Fredholm equations of the first kind. The analogous matrix equation is,
g = K · f (4-2)
where g Input data vector space
K The design matrix.
f The solution vector space
The solution of this equation, f = K-1 g, exists and is unique whenever g is non-zero and K is invertible. In real world situations, however, K usually is not invertible and solutions to this equation are not unique or may not even exist. In such situation approximate solutions are sought. Even when an approximate solution can be determined, inhomogeneous Fredholm equations of the first kind are often extremely ill-conditioned, making the solution unstable. Two common causes of an ill-conditioned system are due to the design matrix K acting as a smoothing operator, and the discretisation of continuous functions.
Smoothing. An ill-conditioned system occurs when the design matrix, K, acts as a smoothing operator. When K is a smoothing operator, the forward solution of equation 4-2, dampens the variability in the vector space f when transforming it to the vector space g. To recover the solution vector f, the operator must be inverted. The inversion effectively creates an amplifier that is used to amplify the signal in the elements of g to reconstruct f. The amplification works on the errors associated with g and K as well as on the input data of g. This can cause the solution to be extremely sensitive to small errors in the inputs. The smoothing can also dampen the signals in the input data below the noise levels, effectively leading to a loss of information, with no means of getting it back in an inversion operation.
Discretisation. Ill-conditioning of the problem is also a result of the discretisation of the continuous functions. The matrix Equation 4-2 is a finite-dimensional problem, created from the continuous infinite-dimensional problem, Equation 4-1. In the discretisation process information about each continuous function is unavoidably lost. Also, as the resolution of the discretised function f(X,t) increases, each component of the solution vector f will have less and less influence on the data g. In the inversion problem, the influence of f can be viewed as the information content of f in each data point gi. Therefore, as the influence of f decreases, each data point gi will contain less information about the higher resolution elements of f (Allison, 1979, Enting and Newsam, 1990). It has been shown (Allison, 1979) that as the information content decreases, it is subjected to higher and higher amplification to reconstruct the unknowns. This amplification works on the error in the observations and kernel as well, creating situations where the error of the reconstructed values are very much larger than the values themselves. Thus, a trade-off between the solution resolution and error exists. In a series of papers, Backus and Gilbert (1967, 1968, 1970) characterized this trade-off by examining the relationship between solution resolution and variance. They then developed an inversion technique to find the optimum solution based on this trade-off.
To handle the loss of information, and instabilities inherent in many inversion problems, a number of techniques have been developed. These inversion methods augment the inversion operator with a-priori information, or implement damping procedures to reduce the instabilities. In these procedures, is important to understand how the added information or damping procedures affect the results, and what trade-offs have been made in the process.
4.1.1 The General Inverse Problem
In this section, the mathematical description and implications of the difficulties encountered in solving an ill-conditioned inversion problem will be given. Also, some solution techniques for analyzing and overcoming these problems will be discussed. (Penrose, 1955; Jackson, 1972; Lawson and Hanson, 1974)
In the linear inversion problem defined by Equation 4-2, the m X n design matrix K linearly maps vectors from the n dimensional solution vector space f to the m dimensional data vector space g. Each column of the design matrix corresponds to one of the unknown components of the solution vector f, and is known as a basis function. The solution of Equation 4-2 for f requires the calculation of a pseudoinverse matrix H such that
= HKf = Hg. The pseudoinverse is also known as the generalized inverse and the natural inverse. If the matrix K is square and non-singular, then H = K-1, and the solution will exist and be unique. The errors induced in the estimate of
due to errors in data,
= g + dg, is
d
f =If K is created from the discretisation of an ill-posed problem, then K will be ill-conditioned, and components of K-1 will be large. As previously discussed, this can lead to error df being much larger than
. Similarly, errors due to uncertainties in the matrix K are:
(K + dK)
= g (4-4)
d
f =which are subjected to the same error amplification due to an ill-conditioned matrix K.
More commonly, the matrix K will be rectangular containing more or less unknowns than equations and the inverse will not exist. In the case of more knowns than unknowns, a least squares solution technique is often employed. This solution technique seeks the best estimate of
by minimizing the merit function (Press et al., 1992; Berthouex and Brown, 1994):
(4-6)
where c2 is the chi square function
sgi is the observation error
This is accomplished by solving the normal equations (assuming, sgi = constant):
(KTK)
= KT g (4-7)
= (KTK)-1 KT g (4-8)
where KT is the transpose of the matrix K
In this case the pseudoinverse H = (KTK)-1 KT. This inverse suffers from the same error amplification due to ill-conditioning as the inverse of the square matrix K-1.
In the previous section, smoothing and increased resolution were identified as the causes for ill-conditioned problems. Smoothing tends to obscure the distinction between the unknown variables. The result of this, is that the columns of the design matrix become closer to being linear combinations of other columns (multicollinearity). The greater the smoothing, the more ill-conditioned the matrix becomes, and the closer the matrix becomes to having column degeneracies.
Increasing the resolution of the solution vector has a similar effect on the design matrix as smoothing. As the resolution is increased, there will generally be a smaller distinction between the impact of neighboring elements in the solution vector on the observations. Therefore, these elements will have higher correlation resulting in the columns of the design matrix being more similar, and thus more ill-conditioned.
The inversion of a highly ill-conditioned matrix has positive and negative components with large magnitudes. In a system with no errors, these components are delicately balanced to nearly cancel each other out to obtain the solution vector f. However, errors exist in the data and design matrix which result in the data being unable to clearly distinguish between two or more of the unknowns, tripping the balance. Instead of the large magnitudes of H canceling out, they add together creating instabilities in the solution with large oscillating magnitudes. The solution vector will be unstable with correspondingly large magnitudes that are delicately balanced to cancel out almost exactly when the fitted function (
= K
) is evaluated. In such cases, least square solutions can have two different combinations of unknowns that can satisfy the least squares criteria equally well. Therefore, the solution is not unique. Least squares problems can thus be simultaneously both overdetermined with more equations than unknowns, and underdetermined, with ambiguous combinations of unknown parameters (Press et al., 1992).
In order to control the error amplification, the pseudoinverse H is formed by including a-priori information used to reconstruct lost information and/or dampening factors to reduce instabilities. The criteria for a good inverse are (Jackson, 1972):
(a) S = KH ~ Im: m X m identify matrix. The solution should give a good fit to the data, i.e. model fit. S is known as the information matrix.
(b) R =HK ~ In: n X n identify matrix. The solution is as close to unique as possible. R is known as the resolution matrix
(c) The uncertainties in the estimated parameters
, expressed as var(
) are not to large. For statistically independent data this leads to:
(4-9)
The meaning of the matrix R, as related to the uniqueness of the solution for an underdetermined system, was put forth by Backus and Gilbert (1968). R is a transformation matrix that maps the entire solution set f into a single vector
. Thus, each element of
can be viewed as a linear combination of any solution f satisfying Equation 4-2. A data set can resolve all of the features of f only if R = In, then
= f and the solution will be unique. If R <> In then the data vector will be projected onto a lower dimensional subspace. The degree to which the inversion resolves the features of f is dependent upon how closely R is to a diagonal matrix. Thus, the matrix R is an indication of the uniqueness, as well as a measure of how well or poorly each element of the solution vector is determined
Similarly, the matrix S is a measure of the independence of the data (Wiggins, 1972). The matrix S can be viewed as a mapping of an arbitrary data vector g into a theoretical data vector
. If S <> Im then some of the data is redundant. The structure of S gives insights into the marginality of the data. An important issue is whether the addition of particular data will enlarge the dimensionally of the solution subspace onto which R projects the solution vectors. If the additional data does not enlarge these subspaces, then the data will not improve the resolution achievable in the inversion (Enting and Newsam, 1990).
The criteria for a good inverse to the matrix K can rarely be met in an ill-conditioned problem. Normally, a compromise between the model fit and solution uniqueness must be made with the variance and stability of the solution, where the model fit is the agreement between the input data and the forward solution to Equation 4-2. This often leads to a trade-off between two optimization functions for the calculation of the approximate inverse H (Backus and Gilbert, 1968). One function, A(f), optimizes the agreement between the data and solution, i.e. the model fit, and the other, B(f), optimizes for the least solution variance or the smoothness and stability of the solution. The solution to the inversion problem becomes (Parker, 1977; Press et al., 1992):
minimize A(f) + lB(f) = 1/l A(f) + B(f) (4-10)
l
= 0 -> ¥This equation can be interpreted as the minimization of A(f) under the constraint of B(f) or vice versa. The parameter l weights the function in the optimization. As l varies from 0 to ¥
, the solution
varies along a trade off curve between minimizing A and minimizing B. (Figure 4-1). When l=0, the agreement between the model and the data is optimized usually resulting in an unstable solution with drastic oscillation. When l = ¥
, the solution is optimized for the smoothness, or stability, or possibly a function reflecting a-priori information about the solution. While A(f) is dependent on the measured data, B(f) is not.
The addition of the smoothing function B(f) has the added benefit that it allows for the generation of unique solutions from degenerate design matrices, i.e. underdetermined inversion problems or problems with singular matrices. For example, the least squares solution of an underdetermined system does not have a unique solution. However, if a non degenerate quadratic function is added to the quadratic least squares criteria, then the result will be a non degenerate quadratic constraint that is well conditioned, thus, can be inverted. (Penrose, 1955; Jackson 1972, Lawson and Hanson, 1974)
Figure 4-1. A schematic diagram of the tradeoff between model agreement and the solution stability or smoothness when Equation 4-10 is solved for values of
l ranging between 0 and ¥ .Method of Regularization
One powerful class of inversion techniques is the method of regularization. Also known as Linear Regularization, the Phillips-Twomey method , constrained linear inversion, among others (Phillips, 1962; Twomey, 1977; Allison, 1979; Press et al., 1992). This class of techniques uses the least squares criteria for the measure of model fit, A(f), while the function B(f) is a measure of the smoothness, dependent on the order of the regularization.
The zeroth-order regularization method defines the function B(f) as the magnitude of the solution vector
, that is, the norm of the vector
. Minimizing the solution vector magnitude introduces stability by reducing the variation in the solution vector. Thus, reducing any wild oscillations that can result from ill-conditioning. The unique solution to the inversion problem is found by minimizing:
c
2+ l |A benefit of this technique is that it requires no a-priori information about the solution. However, in the process of dampening out variations in the solution vector due to instabilities, actual variations can be removed.
Higher order regularization methods employ a-priori information into the solution scheme by assuming a functional relationship of the solution vector. For example, the first order method is based upon the assumption that the result is approximately constant. Under this constraint, the function B(f) can be defined as:
B(f) a
(4-12)
which is equal to zero when
k =
k+1 and non-negative. This scheme can also be used if the unknown vector is thought to be close to a known vector. Under this condition,
i+1 = f'i where f' is equal to a known vector that approximates the solution vector. In this case, the solution
can be viewed as an adjustment to the vector f'.
If it is believed that the solution vector varies linearly then a possible function B(f) would be:
B(f) a
(4-13)
and so on for assumed higher order functions of the solution.
This is only one class of linear inversion techniques. Other classes involve using a different model fit criteria such as minimizing the sum of the absolute values of the residuals. Still other classes of inversion methods use the Backus and Gilbert techniques that optimizes between solution resolution and variance, or the entropy techniques that use information theory, along with a number of non-liner techniques. For more information on these inversion methods and others see Dimri (1992), Tarantola (1987), and Press et al., (1992).
Singular Value Decomposition (SVD)
Singular value decomposition (SVD) is the factorization of an m X n arbitrary matrix K with rank p into two orthonormal matrices, U(m X p), V(n X p) and a diagonal matrix L(p X p), such that:
K = ULVT (4-14)
H = K-1 = VL-1UT (4-15)
The diagonal elements L are non-negative and are called the singular values. The degree of ill-conditioning of K is identified by the condition number, the ratio of the largest singular value to the smallest. The larger the condition number the greater the ill-conditioning. As will be shown, the pseudoinvers inverse matrix H produced by SVD is equivalent to zeroth order regularization. However, in addition to creating H, SVD provides information concerning the dimensionality of the solution space, and the results are in a more convenient form for the analysis of the inversion problem.
SVD is due to the work of Penrose (1955) and Lanczos (1956). The decomposition is accomplished by finding two sets of eigenvectors ui and vj and a unique set of eigenvalues, li. This is accomplished by solving the equations (Jackson 1972):
Kvj = lj uj (4-16a)
KT ui = li vi (4-16b)
or
KTK vj = lj2 vj (4-17a)
KKT ui = li2 ui (4-17b)
It can be shown that:
l
i = lj if i=j, i< = pl
i = 0 , i<pl
j = 0 , j<pThe eigenvectors ui, corresponding to non zero eigenvalues lj, comprise the columns of the matrix U. These eigenvectors span the space of the data vector g that can be reached by the matrix K. If p = m, then the eigenvectors ui span the entire vector space of the data vector g. In this case, the existence of a solution is guaranteed. The eigenvectors vi, corresponding to non zero eigenvalues, comprise the columns of the matrix V. These eigenvectors span the solution vector space. The condition p = n guarantees that if a solution exists then it is unique. The matrix L is a diagonal matrix containing the eigenvalues lj.
There are four different sets of linear equations that can be encountered in the inversion of Equation 4-2. These are a set of equations where the number of equations:
1) equal the number of unknowns.
2) is less than the number unknown.
3) is greater then the number of unknowns.
4) is greater than or equal to the number of unknowns, but the unknowns are not linearly independent.
The last set of equations is equivalent to column degeneracies in the design matrix. The solutions that SVD "picks" under these condition is given below.
Under the first condition, the transformation matrix K will be square and full rank so that p = m = n. SVD creates a pseudoinverse matrix H equal to K-1, and a unique solution will exist. In the second set of equations, the system is underdetermined and p = m < n. Under these condition, SVD guarantees a solution, but it will not be unique. Among the infinite number of solution, SVD finds the solution that minimizes the magnitude of the solution vector, |
|. The third set of equations produces an overconstrained system, m > n = p, where a solution does not exist. SVD finds a unique solution
closest to the actual f in a least squares sense. Therefore, the SVD solution is equivalent to the minimization of the c2 criteria, Equation 4-6.
In the fourth set of equations, there are more equations than unknowns, but the design matrix K contains column degeneracies, therefore m ³
n > p. This occurs when the system of equations do not distinguish between two or more of the unknowns. It is an implication that the system of equations needs fewer unknowns to describe the vector of knowns. Under this condition, the system of equations is both overconstrained and underdetermined. A solution will not exist, and an infinite number of solutions will satisfy the least squares criteria equally well. For this system, SVD picks the solution vector with the minimum magnitude, |
|, that satisfies the least squares criteria. As the rank decreases from p=n, the least squares criteria is weighted less while the minimization of the |
| criteria is weighted more.
The Resolution and Information Matrix. If the design matrix K has been decomposed via SVD, then the resolution matrix R = HK = VL-1UT . ULVT = VVT. When the rank p is equal to the number of unknowns n then R equals the identity matrix, and the unknowns are perfectly resolved. However, when p < n the R is the optimum resolution matrix in the sense that it minimizes
r
k = Sj (Hki Kij - dkj)2 = Sj (Rkj - dkj)2 (4-18)for each value of k. Each row of R can be viewed as the closest fit to the identity matrix in a least square sense.
Similarly, the information density matrix becomes S = TH = ULVT . VL-1UT = UUT. S will equal the identity matrix when the rank p equals the number of equations. When p < m, the SVD technique uses the data to it fullest extent in the sense that it minimizes:
s
k = Si (Kij Hki - dkj)2 = Si (Skj - dkj)2 (4-19)Using only the variance of the observations ignores any errors associated with the design matrix. One means of including the errors in the design matrix is to base the weights on an effective variance (Britt and Luecke, 1973; Henry et al., 1984):
Veff = Var(g) + Var(model) (4-20)
Var(modeli) =
(4-21)
Unfortunately, the error in the model is difficult to quantify and an effective variance cannot always be calculated. In these cases the model error is often ignored. However, in cases where the model error is known to be significantly larger than the observation error and vary from one observation to another, some means of including it in the solution is desirable. This may be accomplished by calculating effective variances based upon some a-priori knowledge
Solution Variance. The variance of the elements in the solution vector for statistically independent observations with normally distributed errors can be calculated by Equation 4-9. If the system has been weighted by the variance of the observations, then the solution variance can be calculated from the SVD inversion via:
var(
k) =
(4-22)
The influence of the ill-conditioning on a system is reflected in Equation 4-22. As a system becomes more and more ill-conditioned, the eigenvalues become smaller. Therefore, the precision of the data and model must increase accordingly or the variance of the solution will become large, and the solution will be unstable. A means of reducing the variance is to construct the inverse matrix H from the q largest eigenvalues (Jackson, 1972; Wiggins, 1972). The effective result of this is to add singularities to the design matrix reducing its rank; thereby, reducing the number of degrees of freedom in the system. This degrades the resolution and information content of the system, so that a trade-off occurs between the resolution and variance of the solution as the number of eigenvalues included in the inversion changes.
As the number of eigenvalues is reduced, the system has fewer degrees of freedom to minimize the chi square criteria, c2, resulting in a larger c2 and poorer model fit. However, the solution is more constrained by the minimization of the magnitude of the solution vector criteria. Therefore a trade-off between the model fit, c2, and solution stability and smoothness, |
|, exists such that the SVD solution to Equation 4-2 becomes:
which minimizes |K
- g| + l|
| (4-23)
where l is a function of the number of eigenvalues in the system. Note, that this equation is equivalent to the zeroth order regularization. So the effect of increasing the smoothness criteria in the zeroth order regularization is equivalent to reducing the number of degrees of freedom of the solution vector.
The proper number of eigenvalues to be used in the inversion process depends on the uncertainties in the design matrix K and data vector g against the need of the model to fit the data, i.e. how closely K
reproduces g. Care must be taken in choosing the proper number of eigenvalues. In a fundamentally underdetermined system, an exact solution may exist but will not be unique. The inversion operator may be relying on poorly determined features of the data and design matrix to construct this solution, so that the solution deviates from reality. In an over-constrained system, the data may not be able to distinguish between certain unknowns in the model creating large variances in the solutions.
Inversion methods are statistical inferences that are only partly based upon the observations. Prior assumptions and information about the underlying situation accounts for an equally important part of the problem. For example, in an inversion employing the least squares approach, without any a-priori information or stability function, the errors of the system are assumed to be independent and normally distributed. While these conditions are rarely met, the application of the procedure is usually justified by the assumption/hope that minor errors in the mathematical model and assumptions should cause only a small error in the final results. This concept is embodied in the continuity principle of robust estimation (Wilkinson, 1979). Unfortunately, this is not always true. It has become apparent that solutions can be excessively sensitive to seemingly minor deviation from the underlying assumptions even when the majority of the data errors correspond to the underlying assumptions (Huber, 1981).
Least squares approaches are especially susceptible to these problems. Even when the majority of the data errors are approximated by a normal distribution, there may be contamination by a small number of outlying points. An outlying point, or outlier, is an observation that deviates excessively from the "true" model, where the true model consists of the design matrix and the actual values of the unknown parameters, which can never be known. The outliers can be the result of blunders in the observations, such as recording the wrong measurements or an unknown failure in the measuring device, or can result from large errors in the design matrix. The least squares approach to inversion minimizes the chi square merit function, which is the sum of the squares of the residuals between the model fit and input observations. This leads to outliers having an excessive influence on the merit function, since the square of their residual is added to the merit function. Figure 4-2 presents a classical example of how a single outlying data point can highly influence a least squares linear fit to data.
Figure 4-2. An example demonstrating how the least squares solution is overly sensitive to outlying data points.
The presence of outliers has lead to the development of numerous robust methods for performing regression estimates. The term robust generally refers to statistical estimators that are insensitive to small deviations from the idealized assumptions for which the estimator is optimized (Huber, 1981, Tarantola, 1987). Small deviations refer to either fractionally small departures for all data points, or else fractionally large departures for a small number of data points. The second interpretation embodies the concept of outliers.
Most robust estimators can be grouped into one of three categories: M-estimates, L-estimates, and R-estimates (Huber 1972; Huber, 1981; Dimri, 1992; Launer and Wilkinson 1979). The M-estimates are classified as maximum likelihood type estimates. These are similar to least square estimation in that they involve the minimization of a merit function. However, for outliers the merit function invokes a smaller penalty then the sum of the squares. The M-estimate is dependent upon the form of the merit function and the distribution of the residuals. The L-estimates are based upon linear combinations of order statistics, such as the median of the residual distribution. These robust procedures are most applicable to estimations of central value and central tendency, but can be applied to inversion problems for parameter estimation (Dimri, 1992). The R-estimates is a procedure based on the ranks of the residuals. M-estimates are probably the most wildly used in parameter estimation. This results from their relative simplicity and flexibility. This type of robust inversion will be further discussed and applied to the inversion of the source receptor relationship.
4.2.1 Robust Inversion by M-Estimates
Inversion by M-estimations is based upon maximum likelihood type parameter estimations. The maximum likelihood argument is based upon intuition and has no formal mathematical basis (Press et al., 1992). The development of this method can be seen from examining a situation where there exists a set of M data points (xi, yi) that are to be fitted to a model with N parameters aj: y(x) = f(x,aj). The model is assumed to be error free. In determining the parameters aj, intuition tells us that there will exist sets of parameters aj that are very unlikely, in that they produce sets of results
that do not resemble the data yi, and other sets of parameters that are more likely, producing results that resemble the data. This intuition can be quantified by rating the sets of parameters based upon their probability of being the one true set of parameters of the model. If the probability that the parameters are correct is small, then these parameters would be unlikely to be correct and vise versa. The goal then becomes finding the set of parameters that maximizes this probability. The probability function used is dependent upon the probability distribution of the errors associated with the data yi about the true model f{xi; a}. This is described by the equation:
P =
{exp[-r(yi, f{xi; a})]Dy} (4-24)
where r is the negative logarithm of the error probability density
Taking the logarithm of Equation 4-24 the problem becomes finding the set of parameters that minimizes:
r
Usually Equation 4-25 depends on the residual values, ri = yi - f{xi; a}. Differentiating with respect to a leads to:
j
= 0 k = 1 ... N (4-26)
where 
The probability density function r is usually not homogeneous as it is in the least squares case. Hence, to make the solution scale invariant the residual is normalized by a scale s, a measure of the dispersion of the residuals. The scale can be different for each observation, or groups of observations. For example, data combined together with different units requires different scales. Also, knowledge may exist that one group of data has a different characteristic dispersion than another group of data used in the same parameter estimation scheme. Introducing the scale into Equation 4-26:
(4-27)
In this equation, the function j is normalized by the scale. Huber (1981) noted that this can cause convergence problems in the solution of the equations, and proposed minimizing rsi in Equation 4-25 which removes si as a normalizer of j in Equation 4-27. There is also another benefit. The term j/si can be view as a weighting function. If the scales arises due to changes in data units, then one set of data will falsely have a larger influence on the minimization process than the other data. This biased weighting is removed by minimizing rsi instead of r.
If the equation yi = f{xi; a} describes a system of linear equations, and rsi is to be minimized, then Equation 4-27 becomes:
(4-28)
where ![]()
There are a number of different methods for estimating the scale parameter of a group of data for M-estimates. Among the most popular is the median of the absolute deviations or MAD (Agee and Turner, 1979; Hogg ,1979; Huber, 1981):
s = (median |ri|)/0.6745 (4-29)
The value .6745 is derived from the standard normal cumulative function, and is used to make the scale consistent with the normal distribution.
Note that the M-estimation assumes that the design matrix xij is without error. According to Huber (1972) little is known about how to robustize regression procedures with respect to errors in xij. Consequently, a common approach is to lump all errors in xij into the observations. Often xij results from a model that does not fully and accurately account for all processes governing the system. This will most likely result in the errors of the observed values yi not being distributed as a Gaussian distribution around the model values f{xi; a}, furthering the need for more robust procedures than least square techniques.
The error structure of the estimated parameters
is not exactly known, however, asymptotic theory suggest that
(
- a) has an approximate p-variate normal distribution with a mean vector 0 and covariance matrix (Huber 1972, 1981):
(4-30)
Variance of Estimated Parameters
An important aspect of the estimation of parameters is to provide meaningful estimates of their variance. One means of accomplishing this is to use the diagonal elements of the approximate covariance matrix in Equation 4-30. However, better estimate of variance exist. These include the estimation by Monte Carlo techniques(Efron and Tibshirani, 1986; Huber, 1973) and the jackknife method (Huber, 1972; Hogg, 1979). These techniques suffer from the fact that as the number of equations and unknowns increase their calculations become prohibitively expensive. For example, the jackknife technique recalculates the unknown parameters with one of the observations removed. The variance is then calculated from statistics of the resulting estimated parameters. If the system contain 1000 observations, then the jackknife requires the recalculation of the robust technique 1000 times.
To determine the suitability of the covariance matrix to estimate the variance of the parameters, some authors have compared the variance from Equation 4-30 to estimations from Monte Carlo methods. Huber (1973) found that there was excellent agreement for observational error distributed as normal and Cauchy probability density functions as long as the ratio of observations to unknowns was greater that 8. Constable (1988) found that Equation 4-30 compared well to a bootstrap method of variance calculation when a least square approach was used for parameter estimation in the analysis of geomagnetic data. However, when he used a maximum likelihood technique based upon heavy tailed probability density distributions, Equation 4-30 underestimated the variance by 20 to 50% as estimated by the bootstrap approach.
Another important aspect of parameter estimation is the confidence interval. Without them, the variance is only an estimation of the relative error. With non-Gaussian observation errors, meaningful confidence intervals can only be determined from Monte Carlo techniques or detailed analytic calculations.
M-type Estimators
Three classic probability density function used in M-estimation are:
1) Normal: r(x) = x2/2, j(x) = x
2) Double exponential: r(x) = |x|+c, j(x) = sign(x)
3) Cauchy: r(x) = ln(1+x2)+c, j(x) = 2x/(1+x2)
Using the normal pdf, Equation 4-28 reduces to the least squares estimate. In the equation, the function j acts as a weight and says that the larger the residual the greater its influence on the minimization. In the double exponential pdf, the tails of the distribution are larger than for the normal pdf. This is reflected in the maximum likelihood estimation where the mean absolute deviation is minimized, rather than the mean square deviation for the normal distribution. Consequently, all observations have the same weight j. This leads to a robust solution since outliers cannot dominate the solution. The Cauchy distribution has even more extensive tails than the double exponential distribution and j first begins to increase with larger residuals but then decreases. Therefore, the true outliers will have a weight close to zero.
A commonly used distribution function developed by Huber (1964) is
r
(x) = x2/2, j(x) = x |x| £ c (4-31)r
(x) = sign(x), j(x) = c |x| > cWhere 1 < c < 2 is a tuning constant, with a typical value of 1.5.
These functions define a pdf that has a normal center and double-exponential tails. Distributions of this type are known as finite mixture distributions, and are popular since if the data actually have normally distributed errors, then most of them will fall in the normal center of the distribution function.
Solution Techniques
The solution of Equation 4-28 is generally a nonlinear problem, and an exact solution may not be obtainable. Iterative techniques, such as Newton's methods can be employed to find an approximate solution. Alternatively, an approximate solution can be found by iterating on a weighted least square solution, known as the W-estimate (Hogg, 1979, Huber, 1981, Constable, 1988). Equation 4-28 can be equivalently written as:
(4-32)
where W is a M x M diagonal matrix of weight with Wii = 
This equation is now equivalent to the weighted least squares normal equations. It is solved iteratively by using successive estimates of the parameters a to recalculate the residuals and scales. The initial values can be obtained from a least squares solution with constant weights. This method can be directly employed into any inversion method that uses the least square approach, such as singular value decomposition.
A disadvantage of W- estimation is that if a unique solution exists, the W-estimator can only be guarantee to converge to this solution if j(r/s) is non-decreasing,
such as is found for the double exponential and Huber's pdf. Heavy tailed distributions that have redescending j(r/s) functions, i.e. Cauchy, will have infinitely many solutions generated by the W-estimator. However, if the initial solution is sufficiently close to the true M-estimator solution, then the W-estimator will converge to this solution (Constable, 1988). This places great importance on the initial solution, and a more robust technique than least square maybe necessary to generate it.
4.3 Ill-conditioning of the Source Receptor Relationship
The discrete form of the source receptor relationship defined in section 2.1.3 and Equation 2-17 relates a multi-dimensional concentration field to a multi-dimensional emission field using the transfer matrix. In the solution of the SRR for emission rates, the equation takes the form of an inhomogeneous Fredholm equation of the first kind, and any of the solution methods discussed in section 4.1.2 can be used to retrieve the emission fields. The primary obstacle to inversions is overcoming error amplification due to the ill-conditioned nature of the problem. The following section investigates the causes of the ill-conditioning of the SRR, and how this adversely affects the retrieval of emission fields.
Inversion of the source receptor relationship exploits the differences between the source contributions upon the receptor concentrations to infer the source emission rates. Differences in source impacts are due to variation between the transfer matrix's basis functions, i.e. columns of the transfer matrix, as well as the variability of the emission field. The basis functions are the columns of the transfer matrix, and represent the relative impact of a source on each observation. Difficulties in the inversion of the system occur when there is not enough variation between the basis functions, and/or the source emission rates, resulting in instabilities in the solution. An unstable reconstructed emission field has much higher viability between the source emission rates than exists in the "true" emission field. The actual variability of the emission field is unknown, and determining the proper amount is one of the many subtleties of the inversion problem. While the variability of the emission field is unknown, the variability between the transfer matrix basis functions can be quantified and investigated.
The lack of variation in the transfer matrix occurs when some of the basis functions are highly correlated with each other, or form near linear combinations. This is responsible for the ill-conditioning of the system, where the degree of ill-conditioning signifies the potential error amplification of the inversion process. A formal technique for quantifying the ill-conditioning of a system, and its error amplification, uses the condition number, cond#, the ratio of the largest eigenvalue to the smallest. The condition number varies between one and infinity. If cond# ~ 1, then the system is well conditioned, if cond# >> 1, then the system is ill-conditioned, and if cond# = ¥ , the system contains singularities. The condition number gives an indication of the solution’s relative error, and for simple systems with only a few unknown parameters, the condition number times the observation error approximates the relative error of the solution. The relation of the eigenvalues to the solution error is given in Equation 4-22. Another means of identifying the ill-conditioning of the system is through the correlation of the basis functions of the transfer matrix. If the correlations are near 1 then the system is ill-conditioned, and if the correlations are ~0 then the system is well-conditioned.
In section 4.1, it was shown that the degree of ill-conditioning is dependent upon the smoothing characteristics of the kernel and the resolution sought in the solution. It will be shown in this section that the transfer matrix is a smoothing operator, and the inversion of the SRR can be a highly ill-conditioned problem. The affects of the spatial and temporal resolution of the source and receptor space upon the degree of ill-conditioning will also be explored.
In the source receptor relationship, the transfer probability density matrix K acts as a smoothing operator due to the physical and chemical processes defining it. Advection and diffusion cause emissions from sources to be diluted throughout space and mixed together. Dilution can dampen signals below the noise levels of the measured data causing a loss of information of the source strength and location. The inversion process amplifies these dampened signals causing small errors in the observation and transfer matrix to dominate the reconstructed emission field. Similarly, removal kinetics can also dampen a source's signal. These processes can create highly ill-conditioned systems. In fact, the ill-conditioning of the SRR was first identified by Bolin and Kellin (1963). In there study, error amplification overwhelmed their results for the identification of large scale atmospheric mixing from CO2 measurements.
The diffusion process also causes the mixing of emissions from multiple sources. This can cause the differences between the relative and absolute source impacts to be obscured resulting in loss of information and an ill-conditioned system. For example, in an idealized situation where the diffusion is infinite and the kinetics are spatially uniform, the infinite diffusion will immediately mix new emissions uniformly throughout space. The concentrations at every receptor will be equal, as well as the source contributions, leading to perfect correlation between all the basis functions of the transfer matrix. In such a situation, the system will have only one degree of freedom, and only a spatially constant emission field can be retrieved. All information about the variability of the source emission rates will have been lost.
4.3.2 Resolution of Source Space
The degree of ill-conditioning is dependent upon the spatial and temporal resolution of the emission grid to be retrieved. The relationship of ill-conditioning and resolution of retrieved parameters is well documented in the literature (Allison, 1979; Backus and Gilbert 1968; Parker 1977; Tarantola, 1987). The ill-posed nature of the SRR has also been explored by Newsam and Enting (1988), who modeled the SRR using a simplified form of the diffusion equation which had an analytic solution. They showed, that as the spatial and temporal resolution becomes continuous, the kernel of the inverse operator approaches infinity. This creates infinite errors in the reconstructed emission fields due to the infinite error amplification of small errors in the concentration field. The error amplification in their model was on the order of |m| and |k|1/2 where m and k are the spatial and temporal wave numbers respectively.
This relationship can also be seen from the standpoint of the physical processes governing the SRR. As the discrete resolution of the emission field increases, the distance between the sources emission rates in space and time decrease. Hence, the likelihood of the source's plumes mixing increases, obscuring the differences between their relative source impacts upon the receptor concentrations. If two sources are separated by infinitesimal distance, then the individual source contributions can not be distinguished in a discrete problem, and the transfer matrix will be singular.
Figure 4-3. The correlation coefficient between the basis functions for the source in southern Illinois and the basis function in all other source grid cell. The transfer matrix was created for a conservative species from the third quarter, 1992 Lagrangian airmass history database. Each grid cell represent a source and a receptor, and 90 daily observations were used.
To demonstrate the influence of the spatial resolution of the source space on the ill-conditioning of the SRR, the correlation of the basis function for a source in Southern Illinois with surrounding sources is presented in Figure 4-3. This figure was created using the transfer matrix for a conservative species during the third quarter if 1992. A receptor was placed in every emission grid cell to remove the effects of the receptor resolution (see next section). As shown, the correlation of the basis functions decreases with distance from the Illinois source. Consequently, as the distance between sources increase, the similarity between their basis functions decrease, resulting in a less ill-conditioned system.
4.3.3 Resolution of Receptor Space
In the inversion problem, the source emission rates are inferred from the "information" contained in the receptor observations and the transfer matrix. The more observations that are available, the more information about the emission field that exists in the system, and the higher the quality of the retrieved emission fields, as measured by the uncertainty and resolution of the emission field. Therefore, the ideal situation would contain a continuous receptor space. In actuality, a continuous receptor space is unobtainable because measurements must be made at discrete locations and time. This is not necessarily a hindrance, since there is a diminishing return on the information content with the addition of observations. The information content of the observation data is an important aspect of the inversion problem and has been explored by Wiggins (1972) and Tarantola (1987).
The discrete receptor space is defined by the spatial locations of the receptors, the frequency at which measurements are available, and the sampling period. Changes in the resolution of any of these three dimensions of the receptor space change the information content of the system, and thus, the ill-conditioning of the transfer matrix. The effects of the three dimensions of the receptor space on the ill-conditioning will be discussed next. However, the changes in the information content of the receptor space will not be directly addressed.
Spatial Resolution
The spatial resolution of the receptor space can be examined from the stand point of the distance between the sources and receptors. As a receptor is moved further away from a group of sources, the transfer matrix relating the source impact of a conservative or primary species to the receptor's concentrations becomes more ill-conditioned. This is because, as the distance between the sources and receptors increase, so does the transport time of the emissions to the receptors. The longer times allow for greater mixing of the source plumes, and dilution and removal of the emissions, making each source's relative impacts on the receptors to be smaller and more similar to each other.
The increase of ill-conditioning with source / receptor distance is presented in Figure 4-4. In this figure, the correlation between the basis functions for a source located approximately at Nashville TN, and all other source basis functions is presented for a single receptor located in three different locations. The transfer matrix is for a conservative species during quarter 3, 1992, and calculated from the Lagrangian data bases used in Section 3.41 for the tuning of the Monte Carlo model. The first plot presents the correlation coefficients when the receptor is located at the Nashville source. As can be seen, the Nashville basis function has the highest correlation with sources located near the Nashville source and drops off sharply with distance. The condition number of the transfer matrix for this configuration is 1700. In the next frame, the receptor was placed in S. Illinois, approximately 400km away. The correlations between the basis functions increased between Nashville and the surrounding sources, with correlations greater than 0.6 two grid cells, ~ 300 km, to the east of Nashville. The condition number increased to over 1012, indicating significantly greater error amplification capabilities over the transfer matrix in the first frame. The exact condition number is not known since five of the eigenvalues were less than the machine precision. The last frame has the receptor in N. Illinois, approximately 800km away from Nashville. As can be seen, the correlations between the basis functions increased substantially, with r > 0.9 as far from Nashville as the Atlantic coast and Florida. The transfer matrix corresponding to the last frame is extremely ill-conditioned for the retrieval of source emissions in the mid Atlantic states, and instabilities can be expected when inverting the SRR. In this configuration, there are 34 eigenvalues with values less than the machine precision.
A B C
Condition # = 1700 Condition # > 1012 Condition # > 10^12
0 Eigen Vls less than machine precision 5 Eigen Vls less than machine precision 34 Eigen Vls less than machine precision
Figure 4-4. The correlation of the source basis function at Nashville, TN with all other basis functions in the transfer matrix. The transfer matrices are for a conservative species, created from the third quarter, 1992 Lagrangian airmass history database. The transfer matrix consisted of 81 source grid cells surrounding the Nashville, TN source, and one receptor site located in the source grid cell at A) Nashville TN, B) Southern Illinois, and C) Northern Illinois. The X locates the source grid cell containing the receptor site in each frame. Eighty one observations were used creating a square transfer matrix.
The exact relationship between the source/receptor distance and the ill-conditioning of the system is dependent upon the advection and mixing properties of the system under study. The example presented is an extreme case. Transport of material from the Southeastern U.S. in a northwest direction is not a typical occurrence. This generally occurs during high pressure system when the air is relatively stagnant and slowing moves in a clockwise direction. These conditions are conducive to the mixing of sources over a large domain. In a well defined flow, the source mixing will take place over a smaller domain, leading to less ill-conditioning.
The discussion to this point was concerned with conservative or primary species. When considering secondary species, the degree of ill-conditioning will first decrease and then increase with distance. This results from the amplification of the source signal due to the transformation of primary species to the secondary species, offsetting the damping affects of dilution and removal processes. This was demonstrated by Schichtel and Husar (1995) where the SO42- kinetic probabilities, Pk, for a receptor oriented transfer matrix at a Massachusetts receptor is larger in the Ohio valley than at the Massachusetts receptor site, but the transit probability was largest at the Massachusetts receptor site. Consequently, for a secondary species there will be an optimum source/receptor distance where the ill-conditioning will be at a minimum.
The ill-conditioning of the system is only an indication of the error amplification. Also of importance is the error of the input observation data and transfer matrix that will be amplified. It has been shown from trajectory analyse that error in transport increases with increasing distances from the source (Clarke et al., 1983; Draxler, 1987). Consequently, as the source/receptor distance increases, the error of the transfer matrix increases along with the error amplification abilities of the inversion process, leading to increased possibilities of instabilities in the solution.
Temporal Resolution
The sampling time resolution is dependent upon the frequency and duration of the sampling of the observation data. The maximum value for both of these dimensions is defined by the temporal resolution of the emission field. For example, if the temporal resolution of the emission field is daily, then the maximum sampling frequency given a sampling period of 24 hours is one, or the maximum sampling frequency given a sampling period of one hour is 24, etc.
Over a given period of time, a receptor site's concentration is only dependent upon the sources that impact it. Hence, each receptor contains information about a subset of the sources. If the frequency of the measurements is the maximum, then the series of observations over the time period will contain all of the available information about the sources. However, if the measurement frequency is high compared to the time period of interest, i.e. using daily data to retrieve seasonal emissions, then each observation will contain redundant information, and some of the observations can be discarded with only small adverse affects to the inversion process.
To demonstrate redundancy of the data, third quarter SO2 emission fields over the Eastern US were retrieved using perfect sulfur data with a relative Gaussian error of 40% added to each observation to simulate error in the observation and model. All emission grid cells in which IMPROVE/NESCAUM receptor sites resided were used as receptor sites in the transfer matrices. The system consisted of 356 sources and 20 receptor sites. The perfect data were created from the transfer matrices and the 1985 NAPAP emission inventory. The next section gives a full explanation of how this was conducted. The SRR was inverted for various numbers of observations at each receptor site, using SVD and 150 eigenvalues. The number of observations at each receptor site began at 8 for a total of 160 observations for the first inversion, and was increased in increments of 2 for a maximum of 88 observations at each receptor site and a total of 1720 observations in the final inversion.
The changes in the solution between successive inversions are present in Figure 4-5. This figure contains the average absolute change,
in each source's emission rate as the number of receptor observations is increased:
= 
where n is the number of source, i.e. 356
S is the source emission rate using m observations
S’ is the source emission rate using m+2 observations
i is source index
and the correlation of the solution with the 1985 NAPAP emission field. Since the observation data were created from the NAPAP emission field a high correlation is expected.
As shown, when there are fewer than 10 observations per receptor, 200 total observations, the addition of observations to each receptor have a pronounced affect on the retrieved emissions, with the correlation increasing from r = 0 at 8 observations to r = 0.4 at 10 observations. The benefits of addition observations drops off when more than 18 observations are used, with the correlation increasing from r = 0.64 to r ~ 0.75 at 88 observations. This trend is also seen in the average absolute change in emission rates. The average change in the emission field is most dramatic at low numbers of observations, with an approximate average absolute change of 5*107 kg/day when observations were increased from 8 to 10. This change in the average absolute emission decreases with increasing number of observations and when more than 40 observations are added the average absolute change is relatively constant at ~105 kg/day. The average absolute change in emission rates of ~105 kg/day is an indication of the average variance of the solution. Consequently, the addition of observations above 40 does not reduce the solution variance. Thus these data provide little additional information about the system and can be disregarded.
Figure 4-5. A) The average absolute change in reconstructed third quarter, 1992 SO2 emission rates as the number of observations at each receptor site was increased. B) The correlation of the retrieved SO2 emissions and the 1985 NAPAP SO2 emission field. The inversion was conducted using perfect sulfur observation data with a relative Gaussian error of 40%, generated from the 1985 NAPAP emission field and transfer matrices. The transfer matrices contained 356 sources over the Eastern US, and 20 receptor sites located in source grid cell corresponding the location of the IMPROVE / NESCAUM sites. The inversion was conducted using SVD and 150 eigenvalues.
The second dimension of the temporal resolution is the sampling period. As the sampling period increases, it is more likely that the wind will blow from different directions, increasing the number of sources impacting an observation. Consequently, the differences between relative source impacts will decrease, leading to a more ill-conditioned system. Figure 4-6, presents the transfer matrix indicating the relative impact of SO2 emission as sulfate on the Mammoth Cave KT receptor site, where the sampling period was a day and a week. When the sampling period was a day, only source emission from north impacted the receptor site with several sources having a probable impact greater than 0.25%. When the sampling period was increased to a week, the number of sources impacting the receptor increased substantially in a region stretching from Canada to Texas and Florida. The distinction between the source impacts has also decreased, with only the source located at the receptor having a probable impact greater than .2%. The increased number of sources impacting the receptor during a week as opposed to a day, coupled with smaller distinctions between the source impacts, leads to a more ill-conditioned system.
Figure 4-6. Transfer matrix indicating the relative impact of SO2 emissions on sulfate concentrations at Mammoth Cave, KT, using a sampling time of A) 1 day and B) 1 week.
Often, as the sampling period increases the error in the observation and model decrease. The decreasing in the error can compensate for any increase in ill-conditioning. Therefore, an optimal sampling period will exist balancing the error in the model and observation with the ill-conditioning of the system. This optimal time will be dependent upon the atmospheric advective processes and the relationship of observations and model error to sampling period.
4.3.4 Examples of Ill-Conditioning
To examine the effects of the ill-conditioning on the retrieval of the emissions, the SRR was inverted using perfect "observed" sulfur data. The perfect sulfur concentrations were calculated by solving the SRR forward using the 1985 NAPAP emission inventory at the IMPROVE and NESCAUM monitoring locations for 24 hour averages, twice a week. The pseudo inverse of the transfer matrix was calculated via singular value decomposition. The system was over determined with 582 total observations (knowns), and 356 sources (unknowns). All possible 356 eigenvalues were used in the reconstruction. Hence, the solution was equivalent to a least square inversion.
Using the perfect observation data, the inversion process almost exactly reproduced the NAPAP emission field, with errors resulting from computer round off (Figure 4-7). Under normal circumstances, the observation data contains error. In the case of the IMPROVE/NESCAUM data, the analytical error is approximately five percent. When a five percent normally distributed error was added to the perfect data, the reconstructed emission field degraded significantly, Figure 4-8. The scatter diagram shows significant scatter with an r2 of approximately 0. The reconstructed emissions oscillate between large negative and positive values west of the Mississippi, indicating large instabilities in the solution throughout this region. The instabilities do not affect all regions equally. In New England and the Ohio river basin the reconstructed emissions were well retrieved. The standard error calculated from Equation 4-22, is also presented in Figure 4-8. As shown, the largest errors are located west of the Mississippi, where the greatest solution instabilities are found. In Texas, the standard errors can exceed 1.5 kTons/day, more than a factor of 10 larger than the actual emission rates. Therefore, the ill-conditioning amplified the 5% error of the observation by a factor of 200.
Figure 4-7. The third quarter reconstructed SO2 emission generated from perfect sulfur data at the IMPROVE / NESCAUM locations and sampling times.
Figure 4-8. The third quarter reconstructed SO2 emission field generated from perfect sulfur at the IMPROVE / NESCAUM locations and sampling times. A random relative error of 5% was to each sulfur observation chosen from a Gaussian distribution.
The effects of the ill-conditioning were shown assuming an error free model. This is generally not the case, and as seen in section 3.4.4, the simulated concentrations can deviate substantially from the measured data. To characterize the total error, a histogram of the residuals from simulated sulfur, and the measured sulfur at the IMPROVE/NESCAUM eastern location during quarter 3, 1992 was created, Figure 4-9.
Figure 4-9. The histogram of residuals between the fine particle sulfur from the IMPROVE / NESCAUM networks during the third quarter, 1992 in the Easter US, and the corresponding simulated sulfur using the transfer matrices and 1985 NAPAP SO2 emission inventory.
Normal distributions are overlaid on top of the histogram created from the mean and standard deviation of the histogram using 100% of the data and the middle 90% of the data. As shown, the histogram was best simulated by the normal distribution created from statistics of the middle 90% of the residuals. A relative total error was calculated by normalizing s90% = 0.6 mg/m3 by the mean of the observations, resulting in a total error of approximately 40%. This is 8 times larger than the error due to the observations alone. The SO2 emissions were reconstructed using perfect observation data with a 40% error added (Figure 4-10), resulting in even greater instabilities. The New England and Ohio River basin regions now display some of the instabilities of the other regions. The standard error swamps nearly all of the reconstructed emission values where it is greater than 1.5 kTons/day at most grid cells west of the Mississippi.
The causes for the solution instabilities and the error amplification are seen in Figures 4-8 & 4-10. When a 5% normally distributed error was added to the perfect observation data, the solution became highly unstable west of the Mississippi. This region contained only three observation sites, with little variability in 1985 NAPAP emissions. The lack of receptor sites in this region resulted in large source/receptor distances. The few receptor sites combined with the low variability in the emissions resulted in the instabilities, seen by the large oscillations between negative and positive emissions in the emission grid cells. In New England and the Ohio river basin, the highest density of receptor sites and the largest SO2 emission rates are found. These conditions result in reconstructed emissions that compared well with the 1985 NAPAP emissions. The effect of the ability to resolve sources with emission rates that are significantly different from the surrounding sources can be seen in the Northwestern part of the emission grid in Manitoba, Canada. In this region, there are two sources with emission rates of 0.82 and 0.57 kTons SO2/day compared to ~0.001 kTon SO2/day for the surrounding sites. The retrieved emissions for these two sources were 0.82 and 0.53 kTons SO2/day respectively, while the surrounding sources had emission rates oscillating between -0.7 and 0.7 kTons SO2/day. However, these source emission rates become 2.7 and 1.1 kTons SO2/day when the 40% error was added to the observations (Figure 4-10).
Figure 4-10. The third quarter reconstructed SO2 emission field generated from perfect sulfur at the IMPROVE / NESCAUM locations and sampling times. A random relative error of 40% was to each sulfur observation chosen from a Gaussian distribution.
4.4 Implementation of the Inversion Algorithm
4.4.1 Influence of Eigenvalues on the solution
In SVD, the solution instabilities are damped out by eliminating a set of the smallest eigenvalues in the creation of the pseudo-inverse of the transfer matrix. Each eigenvalue is dependent upon a subset of the basis functions and represents the degree of correlation between these basis functions (Henry and Hidy, 1979) . The smaller the eigenvalue, the larger the correlations. The inversion of the source receptor relationship, Equation 2-18 using singular value decomposition can be written as:
ej = Si VijUij/Li * DXi * ci (4-33)
where Si VijUij/Li is the inverse of the transfer matrix via SVD
e is the source mass emissions [g]
ci is the receptor concentration [g/m3]
DXi is the receptor volume [m3]
As can be seen, when the smallest eigenvalue, Li, is removed from the solution, the degree of freedom with the greatest amplification potential is eliminated for reconstructing the emissions, resulting in a more stable solution. The removal of one eigenvalue essentially removes one of the unknown sources from the solution. This causes a reduction in the resolution of the reconstructed emissions. This loss of resolution causes a redistribution of the mass between all of the sources associated with the removed eigenvalue. However, the redistribution is not mass conserving.
The redistribution of mass is represented by the resolution matrix R which maps the unique solution to the new solution. If the number of eigenvalues used in the solution equals the number of unknown sources, then R is the identity matrix. As the number of eigenvalues is decreased R deviates further from the identity matrix and the resolution of the solution is degraded. Equation 4-33 shows that a pseudo-inverse created from the singular value decomposition produces the optimum resolving matrix R in the sense that is minimizes rk, the sum of squares of each row of R with the chronecker delta, dkj. rk provides a convenient means of measuring how well each source is resolved. If rk = 0 then the source is perfectly resolved, and the closer rk is to 1, the worse the source is resolved. Figure 4-11 presents rk for each source when the transfer matrix was inverted using different numbers of eigenvalues. The transfer matrix is the same one used to create Figure 4-8. It is evident that as the number of eigenvalues increase, the number of sources well resolved also increase. As is expected, New England, with its high receptor station density, is better resolved then the sources west of the Mississippi where there are few stations, and ill-conditioning of the system is high.
Figure 4-11. The value of the diagonal elements of the resolution matrix for each reconstructed source. The resolution matrix was calculated from the third quarter, 1992 sulfur aerosol transfer matrices. The receptor sites were located in source grid cells containing IMPROVE/NESCAUM receptor sites, and are identified by n .
In removing the eigenvalue it is implied that the sources dependent upon the eigenvalue have less variability then the solution suggests. As previously noted, little variability in the true emission can cause a mildly ill-conditioned situation to have large instabilities. Therefore, the removal of the small eigenvalues has the beneficial effect of removing fake variability induced by the conditioning of the system. However, in reducing the variability, it is possible to get rid of features of the data that truly exist. In Figure 4-8, two sources in Canada were resolved while neighboring sources exhibited large instabilities. In removing the eigenvalues to reduce these instabilities we will cause a redistribution of the mass in the sources that are resolved. If two much stability is added, then the true features of the data can be lost. Care must be taken in choosing the proper number of eigenvalues used in the solution. A balance must be met so that as many of the true features of the solution are maintained as possible, while features of the solution produced by the instabilities are removed. It can be difficult to determine what features of the data are true or not, and a-priori information may be required to make the proper judgment.
Retrieval of SO2 Emission Rates from Perfect data with error.
The affect of the number of eigenvalues used in the inversion of the SRR was investigated by reconstructing the third quarter (July, August, September) SO2 emission field in the eastern US. This was accomplished using perfect sulfur observation data generated from the same transfer matrix used in Figure 4-8 and the 1985 NAPAP SO2 emission inventory. To include the affects of observation and transfer matrix error, a relative normally distributed error of 40% was added to each observation. The error was generated by randomly selecting relative error values from a Gaussian distribution with a mean of 0 and standard deviation of 0.4 for each observation. Then the SO2 emission field was reconstructed, varying the number of eigenvalues from the number of sources, 356 to 1.
Metrics of the Inversion Performance. In order to rate the ability of the inversion scheme to reconstruct the NAPAP emissions and to determine the influence of the number of eigenvalues used in the solution, a set of metrics were devised. These metrics are also important for the determination of the number of eigenvalues to use in the final reconstructed emissions. These metrics fell into five categories (Figure 4-12):
1) SVD inversion characteristics
2) inversion outputs
3) forward model fit
4) reconstructed emission field stability
5) comparison of the NAPAP SO2 emission field to the reconstructed.
Inversion via SVD is characterized by a trade-off curve between the forward model fit, as measured by chi square, and the stability of the solution, as measured by the norm of the solution (Figure 4-12B). The trade-off between the model fit and solution stability is dependent upon the number of eigenvalues of the transfer matrix used in the solution (Figure 4-12A). The inversion output metrics consist of the sum of all reconstructed source emission rates, and the sum of the emission rate standard error (Figure 4-12C & D). The metrics defining the model fit are the correlation coefficient between the input observation data and the reconstructed observations and the ratio of the average input observation value to the average reconstructed observation value (Figure 4-12E).
The stability of the solution is measured by the negative emission rates. In the inversion process, no constraint concerning negative emission rates was employed, leading to some of the emission values being negative. This provides an ideal constraint to determine the number of eigenvalues to use in the solution, since a good solution will have as few negative emission rates as possible. In Figure 4-12F, the ratio of the sum of negative emissions to the sum of all emissions is presented along with the fraction of all emission grid cells that have negative emission rates. In the ideal case being considered here, the true emission field is known. Hence, a measure of the suitability of the inversion is to compare the reconstructed emissions to the NAPAP emission field. This was done using the correlation coefficient, and the ratio of the total emission rates and norms (Figure 4-12G & H).
Figure 4-12. Metrics of the inversion performance for the reconstructed SO2 emission field in the Eastern US. Calculated from perfect observation data with 40% Gaussian error at the IMPROVE/NESCAUM monitoring sites and times during Q3, 1992. The metrics are: A) The value of the eigenvalues. B) Trade-off curve between chi-square and norm of reconstructed emissions. C) Total reconstructed emission rate. D) Total standard error. E) Ratio of the average simulated and input observations and their correlation F) Ratio of the negative reconstructed emission rates to the total, and the fraction of the emission grid cells with negative emission rates. G) Correlation coefficient between the reconstructed and NAPAP emission rates. H) Ratio of the total reconstructed to NAPAP emission rates and norm of the emission fields.
Retrieval of SO2 Emission Rates. It is evident from the trade off curve in Figure 4-12B that as the model fit, chi square, increases the stability of the solution decreases. In determining the optimum number of eigenvalues to use, a balance between the model fit and the solution stability must be found. However, there is no single optimum number of eigenvalues to include in the solution, but a range of values will exist. The lower bound of this range can be estimated from the model fit metrics, while the upper bound can be estimated from the solution stability metrics.
The model fit as measured by both the correlation coefficient between the input observation data and the reconstructed observations and the chi-square value degrades with decreasing number of eigenvalues used in the solution. This degradation increases substantially when fewer than 25/356, eigenvalues are used, indicating a possible lower bound. However, the total reconstructed emission rate is relatively constant from 50/356 to 200/356 eigenvalues, but about 6% lower when fewer then 50/356 eigenvalues are used. This indicates that 50/356 eigenvalues is a better lower bound. Using 50 eigenvalues (Figure 4-13), the reconstructed emission field has the highest emission rates in the Ohio Valley and in northern AL and GA. This is similar to the NAPAP emission inventory, but the emission rates are about two times smaller than NAPAP's in these same regions. Also the high emission rates around Toronto and in Manitoba Canada have been washed out. The standard error at all sources is low, generally less than 0.2 kTons/day.
Insights into the upper bound of eigenvalues is found from the total emission rate curves. The total emission rate is relatively constant from 50 to 250/356 eigenvalues. But when more than 250/356 are used, then the total emission rate varies between 65 and 22 kTons/day, indicating extreme instabilities in the solution. At 250/356 eigenvalues, 40% of the sources have negative emission rates, and the total negative emission rate equals the total emissions, so that fewer eigenvalues must be used in the solution. When 140/356 eigenvalues are used, the fraction of emission grid cells with negative emissions and the negative emission fraction is reduced to less than 35%.
A
B
Figure 4-13. Reconstructed SO2 emission fields and standard error in the Eastern US. Calculated from perfect observation data with 40% Gaussian error at the IMPROVE/NESCAUM monitoring sites and times during quarter 3, 1992, using A) 50 eigenvalues in the solution and B) 140 eigenvalues in the solution.
Also, the model fit metrics have only degraded slightly from the 250/356 eigenvalues. Hence, 140/356 eigenvalues is an approximate upper bound. As shown in Figure 4-13, using 140/356 eigenvalues the high emission sources at Toronto and Manitoba, Canada are better resolved. Some solution instabilities are now apparent by the checker board appearance in the regions of low emissions.
The best reconstructed emission field lies in the eigenvalue range 50 to 140 eigenvalues. The comparison to the NAPAP emissions shows that within this range, the highest correlations are found, between r = 0.65 and r = 0.7, and the total reconstructed emissions is approximately equal to the NAPAP. Also, the ratio of the emission norms is the closest in this range. Figure 4-14 presents the emission field using 95/356 eigenvalues, the middle of the eigenvalue range. Comparing to the NAPAP emission in Figure 4-7, it is seen that all of the high emission regions are identified, but the magnitudes are lower than the NAPAP emission field.
Figure 4-14. Reconstructed SO2 emission fields and standard error in the Eastern U.S. Calculated from perfect observation data with 40% Gaussian error at the IMPROVE/NESCAUM monitoring sites and times during Q3, 1992, using 95 eigenvalues in the solution.
This can be seen in places like in Manitoba and Toronto, Canada, where the high emission rates have been lost. This is a result of the spreading of the emission to neighbor source cells due to the loss in resolution with the reduction in the number of eigenvalues.
Removal of Negative Emissions.
In the determination of the optimum range of eigenvalues, one of the goals was to minimize the number of emission grid cells with negative emission rates. This criteria does not eliminate the possibility of negative emission rates in the final emission fields, as seen in Figure 4-13. Since negative emission rates are physically nonsensical a post processor was developed to remove them. The post processor "adjusts" the negative emission grid cells by equally redistributing the negative emission rates to the neighboring grid cells, maintaining mass conservation. The lowest emission rate is then zero.
This is justified on the ground that the negative emission rates are a result of instabilities generated by the ill-conditioning of the transfer matrix. It was shown in section 4.2.1, that one of the primary reasons for the ill-conditioning of the transfer matrix was from similar basis functions for sources located in proximity to each other. In the minimization process, these sources could not be clearly identified and the minimum of the chi square criteria resulted when some of the sources were given emission rates too large. This was compensated for by other nearby sources having negative emission rates. Figure 4-15 presents the reconstructed emission field using 95 eigenvalues with the negative emission rates adjusted. Comparing this figure to Figure 4-14, it is seen that much of the checker board pattern has been removed. Table 4-1 compares some of the metrics between these two reconstructed emission fields. As shown, adjusting the negative values slightly degrades the model fit with the chi-square increasing from 737 to 787, but the correlation with the NAPAP emissions increased from r=0.7 to r=0.75. The largest impact was on the solution norm where it decreased by 10%.
Table 4-1. Some pertinent inversion metrics comparing the Q3 reconstructed SO2 emission fields with the negative emission grid cells and the negative emission grid cells adjusted.
|
SO2 Reconstructed Emission Field w/ negative emission rates |
SO2 Reconstructed Emission Field w/o negative emission rates |
|
|
Total emission rate [kg/day] |
5.85E+07 |
5.85E+07 |
|
Recon. emission norm [kg/day] |
7.38E+06 |
6.7E+06 |
|
Regression equation & coeff. between obs. & recon obs |
0.09 +0.91x r = 0.954 |
0.1 +1.1x r = 0.951 |
|
Chi-square |
737 |
787 |
|
COMPARISON TO NAPAP |
||
|
Regression equation & Coeff. |
57600 +0.64x r = 0.7 |
62800 +0.6x r = 0.75 |
|
Ratio of total recon. emission rate to total NAPAP emission rate |
0.98 |
0.98 |
|
Ratio of recon. emission norm to NAPAP emission norm |
0.92 |
0.84 |
Figure 4-15. The same figure as Figure 4-14, but the negative values have been adjusted by redistributing the negative values to neighboring sources.
4.4.2 Robust Inversion of the Source Receptor Relationship
In the previous section, the errors of the system were randomly selected from a Gaussian distribution preventing any of the data points acting as outliers. When actual data is used in the inversion, a fraction of the data will most likely be outliers that can corrupt the retrieved emission field. This section presents the process for making inversion via singular value decomposition robust. Then using simulated data, it is demonstrated that the robust SVD can identify and suppress the influence of the outliers on the retrieved emission field.
Robustizing Singular Value Decomposition.
The inversion of the source receptor relationship was made robust by implementing a W-estimator into the singular value decomposition (see section 4.2.1). The weights were calculated using Huber's j function with the coefficient c = 1.5. The scale was determined using Equation 4-30, and the uncertainty of the retrieved emission values was determined from the diagonal elements of the covariance matrix calculated from Equation 4-31.
Huber's j function was used for several reasons. First, it is a non decreasing function so the solution is guaranteed to converge. Second, the bulk of the residuals of the simulation of ambient and wet deposited sulfate in the Eastern US during 1992 is approximated by a normal distribution. This is demonstrated in Figure 4-16 where a normal distribution is overlaid a histogram of residuals calculated from third quarter simulations. Also, overlaid the histogram is a double exponential distribution. As shown, this distribution does not completely account for the prominent tails of the residual histogram. However, a distribution with heavier tails would result in a redescending j function. Besides the potential convergence problems, Huber (1981) offers these words of caution in the use of redescending j function:
"the importance of using redescending j functions has been exaggerated beyond all proportions. They are certainly beneficial if there are extreme outliers, but the improvement is relatively minor (a few percent of the asymptotic variance) and is counter balanced by an increase of the minimax risk. .... in particular, [there is] an increased sensitivity to wrong scales. Unless we are careful we may even get trapped in a local minimum of
. The situation gets particularly acute in multiparameter regression."
Finally, it will be demonstrated in the following section, that the employed robust technique is able to adequately reduce the influence of outliers in the retrieval of emissions.
Figure 4-16. A histogram of residuals from the simulation of the IMPROVE/NESCAUM fine aerosol sulfur concentrations during Q3, 1992, in the Eastern US. The 1985 NAPAP SO2 emission inventory was used in the simulation. A normal and double exponential distribution have been overlaid the histogram using the mean and standard deviation of the middle 90% of the residuals.
Estimation of Variance for the Retrieved Emission Rates. The variance of the retrieved emission rates were calculated from the covariance matrix. While it has been shown that the covariance matrix produces variances similar to Monte Carlo simulations, caution must be applied in over interpreting these values in the retrieval of emissions rates. The two studies cited that used Monte Carlo simulations, Huber, (1973) and Constable, (1988), dealt with situations with only several unknown parameters to be estimated, compared to about 500 in this study. Also, no stability functions were added to their solutions which is a vital part for the inversion of the SRR. Consequently, the reported variances should be though of as relative measures of the error. They can be used for comparisons of several solutions of the same emission inventory and between source emission rates in the same emission inventory. However, caution must be applied in using the variances for defining confidence intervals of the retrieved emission rates. In the future, Monte Carlo simulations will be used to test the validity of the reported variances.
The addition of the W-estimation to the singular value decomposition is a simple process where the weighting procedure by the estimated observation variance is replaced by the calculated weights (see singular value decomposition in section 4.1.2). The inversion process is repeated until both the scales si and the weighted chi square values were within 2% on successive iterations. This process requires the reoptimization of the weights for each set of eigenvalues used in the solution, since the residuals change when different degrees of stability are imposed upon the solution. This results in a severe computational penalty, since the singular value decomposition must be performed and iterated upon for each new set of eigenvalues.
Estimation of Scale. The scale is a measure of the spread of the distribution characterizing the absolute error of the observations. The scale is an important part of a robust procedure implementing Huber's j function. An improper scale will give some observations undue influence on the inversion while decreasing the influence of other valid observations.
Determining the proper scales for the inversion of the SRR is complicated by the fact that the errors of the observations and transfer matrices tends to be relative to the magnitude of the observation values. Consequently, observations with large values will have larger absolute errors then observations with smaller values, and thus should have a larger scale in the robust procedure. This is demonstrated in Figure 4-17, where histograms of residuals of the smallest 50%, and the largest 50% of the observations used in Figure 4-16 are presented. As can be seen, the histogram of residuals for the low observations values has a more narrow spread, s = 0.33 mg/m3, then the histogram for the high observations values, s = 1.2 mg/m3.
A B
Figure 4-17. Histogram of residuals of A) the smallest 50%, and B) the largest 50% of the observations used to create the histogram in Figure 4-16.
The relative errors of the observation data were taken into account by grouping the observations into quartiles consisting of the smallest 25%, 25 - 50%, 50-75%, and 75 - 100% of observation values, then calculating a separate scale for each group. When non-similar observation data were combined, such as aerosol concentrations and wet deposition data, separate sets of scales were calculated for each type of data.
Testing the Robust Inversion
In order to test the influence of outliers and the ability of the robust inversion to identify and suppress the outliers, several idealized situations were investigated. In section 4.2.1 it was shown that using perfect sulfate observations, it was possible to reconstruct the NAPAP emission nearly perfectly. The perfect sulfur concentrations were generated by solving the SRR in the forward direction using the 1985 NAPAP emission inventory at the IMPROVE and NESCAUM monitoring locations for 24 hour averages, twice a week. Using these same conditions, one observation at Sumapee Mt, NH was set from 1 mg S/m3 to 21 mg S/m3. Adding this one outlier to the system resulted in a chi-square value of C2 = 400, where s = 1. As shown in Figure 4-18A the least squares inversion produced a wildly unstable solution. However, the chi square value was reduced to 180. Demonstrating how the least squares approach will sacrifice stability to decrease the chi square value. When the robust inversion was used, the retrieved emissions nearly perfectly reproduced the 1985 NAPAP emissions (Figure 4-18B) with a correlation coefficient between the NAPAP and reconstructed emissions of r = 1.00 and the correlation equation: -.03 +1.0x. The chi square value was C2 = 430.
To exercise the robust procedure under conditions of multiple outliers, 3.48 mg S/m3 was added to all 33 sulfur values at the same Sumapee Mt, NH location. The chi square resulting from these aberrations was also 400. The least squares approach again produced an unusable emission pattern (Figure 4-19A) with a chi square of C2 = 190, while the robust procedure reproduced the NAPAP emission well with a correlation coefficient between the NAPAP and reconstructed emissions of r = 1.00 and the correlation equation: .01 +1.0x. The chi square value was C2 = 430.
Under more realistic conditions, all observations will be affected by some error, and outliers will be distributed throughout space and time. In order to investigate these conditions, the SO2 emission field was retrieved using the perfect observation data with a 5% Gaussian error added and with and without a randomly chosen 10% of the data were made to be outliers. The outliers were created by adding ± 2 to 10 mg S/m3 to their values, if an outlying observation was below 0 it was set to 0 mg S/m3. The added error causes instabilities in the retrieved emissions due to error amplification. Stability was added by reducing the number of eigenvalues (degrees of freedom) used in the singular value decomposition. It was found that 250 of the 365 eigenvalues produced reasonable results as measured by the metrics of section 4.3.1. Figure 4-20A, presents the retrieved emission when no outliers were added to the data. As shown, the retrieval almost exactly reproduces the NAPAP emission field with a correlation of r = 0.95 (Table 4-2). When the system was inverted with 10% of the data as outliers, using 250 eigenvalues, the retrieved emission pattern deteriorated considerably, (Figure 4-20B). However, when the robust procedure was employed with this same set of observations, the reconstructed emissions were similar to those derived without the outliers. It is evident that the outliers still have a negative influence upon the solution, as seen by the increased percentage of negative emissions, 36% compared to 14%, and the decrease in the correlation between the NAPAP and reconstructed emissions, r = .89 compared to r =.95 (Table 4-2).
These examples demonstrate that non robust emission retrievals are sensitive to outliers even when stability is added to the inversion process. Also, the W-estimation technique using Huber's j function is able to identify and suppress the influence of these outliers on the results. However, the robust procedure is unable to completely eliminate the influence of the outliers.
Figure 4-18. Reconstructed Q3 SO2 emission fields generated from perfect fine aerosol sulfur data at the IMPROVE/NESCAUM locations and times with one added outlier at Sumapee NH using a) a least squares inversion process, and B) robust least squares inversion
Figure 4-19. Reconstructed Q3 SO2 emission field generated from perfect fine aerosol sulfur data at the IMPROVE/NESCAUM locations and times. Outliers were created for all data at Sumapee NH by adding 3.48
mg/m3 to each value. In A) a least squares inversion process was employed, and in B) a robust W-estimation inversion was used.A B C
Figure 4-20. Reconstructed Q3 SO2 emission fields in the Eastern US. Calculated from perfect observations with a 5% Gaussian error at the IMPROVE/NESCAUM monitoring sites and times Q3, 1992. Singular value decomposition with 250/356 eigenvalues was used to perform the inversion. In figure A) the inversion was Robustized. In figure B) 10% of the observations were made outliers and the inversion was not robustized. Figure C) is same as B) but the robust inversion was applied.
Table 4-2. The metrics of the inversion performance for the retrieval of Q3 SO2 emission fields in the Eastern US. Calculated from A) perfect observations with 5% Gaussian error using SVD. B) perfect observations with 5% Gaussian error and 10% of the observations as outliers using SVD. C) perfect observations with 5% Gaussian error and 10% of the observations as outliers using the robust SVD.
|
5% Error with no outliers SVD Inversion 250 eigenvalues |
5% Error & 10% outliers. SVD Inversion 250 eigenvalues |
5% Error & 10% outliers. Robust SVD Inversion, 250 eigenvalues |
|
|
RECON EMISSION STATISTICS |
|||
|
Total emission rate [kg/day] |
5.9E+07 |
8.9E+07 |
5.7E+07 |
|
% total emissions negative, |
14 |
260 |
36 |
|
% of sources with negative emissions |
29 |
47 |
24 |
|
Recon. emission norm [kg/day] |
8.34E+06 |
4E+07 |
8.16E+06 |
|
Robust Scale [mg/m3] |
-- |
-- |
.04 |
|
MODEL FIT |
|||
|
Regression equation & coeff. between obs. & recon obs |
0 +1x r = 1 |
.35 +.73x r = 0.85 |
.35 +.5x r=0.72 |
|
Chi square, wgted data, s = 1 |
3 |
714 |
270 |
|
COMPARISON TO NAPAP |
|||
|
Regression equation & Coeff. |
3550 +.99x r = 0.95 |
64000 +1.1x r = 0.21 |
3700 +.93x r=0.89 |
|
Ratio of total recon. emission rate to total NAPAP emission rate |
.98 |
1.5 |
.95 |
|
Ratio of recon. emission norm to NAPAP emission norm |
1.04 |
5 |
102 |