Least-squares gradient calculation from multi-point observations of scalar and vector fields: methodology and applications with Cluster in the plasmasphere

. This paper describes a general-purpose algorithm for computing the gradients in space and time of a scalar ﬁeld, a vector ﬁeld, or a divergence-free vector ﬁeld, from in situ measurements by one or more spacecraft. The algorithm provides total error estimates on the computed gradient, including the effects of measurement errors, the errors due to a lack of spatio-temporal homogeneity, and errors due to small-scale ﬂuctuations. It also has the ability to diagnose the conditioning of the problem. Optimal use is made of the data, in terms of exploiting the maximum amount of information relative to the uncertainty on the data, by solving the problem in a weighted least-squares sense. The method is illustrated using Cluster magnetic ﬁeld and electron density data to compute various gradients during a traversal of the inner magnetosphere. In particular, Cluster is shown to cross azimuthal density structure, and the existence of ﬁeld-aligned currents in the plasmasphere is demonstrated.


Introduction
This paper deals with the computation of gradients of physical quantities (scalar or vector fields) that are measured in situ at different times and positions. This topic has gained importance in the context of recent magnetospheric multispacecraft missions, in particular the Cluster mission consisting of four identical spacecraft, flying in formation. The rationale behind this mission is the idea that exactly four simultaneous measurements are needed to determine the three spatial gradient components, at least if the four spacecraft are not coplanar. Methods to do so have been developed Correspondence to: J. De Keyser (johan.dekeyser@bira-iasb.oma.be) Chanteur, 1998;Chanteur and Harvey, 1998;Robert et al., 1998a). Such gradient computations are difficult for a number of reasons: (1) Four simultaneous measurement points are required. A smaller number of spacecraft is insufficient, while the method also cannot properly take advantage of a larger number of spacecraft if available. For those Cluster instruments that are not operating on the four spacecraft, this instantaneous spatial gradient computation is therefore precluded. Also, data are never obtained exactly simultaneously, a problem that is usually dealt with by time averaging or interpolation. (2) One never computes a gradient, but rather a spatial difference representing the average gradient over a length scale fixed by the spacecraft separation distances, usually different in each space dimension. This scale often does not correspond to the physical scale size of interest. (3) The method applies only if the so-called spatial homogeneity condition is satisfied, that is, it computes an average gradient over the spacecraft separation length scale, but that is only useful if the true gradient does not differ too much from the average one. (4) Computing differences is notoriously difficult. Subtracting two similar data values leads to a relative error on the difference that may be much larger than the error on the original data. This is especially true for spacecraft that are very close together, a situation often required to satisfy the spatial homogeneity condition. A difference can therefore be reliably computed only when both random and systematic errors on the data are very small. This necessitates accurate data intercalibration between the spacecraft, something that can be difficult to achieve as the operating conditions on each spacecraft tend to be different. Systematic gradient computations with Cluster have therefore been applied only to magnetic field data (FGM instrument; Balogh et al., 1997Balogh et al., , 2001 with its low measurement error and good intercalibration (especially the curlometer; Dunlop et al., 2001Dunlop et al., , 2006Dunlop and Balogh, 2005;Vallat et al., 2005) and to electron density data obtained from the plasma frequency (WHISPER instrument; Décréau et al., Published by Copernicus GmbH on behalf of the European Geosciences Union. uares gradient algorithm uses data acquired in a set of points in space-time, represented onal space (x 1 , x 2 ). In this example, the data are obtained along the trajectories of three on the dotted lines), although that does not matter for the method. The homogeneity ed by associating with each data point an error that grows with the distance from x 0 , the ient is computed. This distance, however, is measured in a frame (l 1 u 1 , l 2 u 2 ) that may be lative to the original frame. Points on the ellipse with semi-axes l 1 and l 2 are assigned a unit ide the ellipse (dark shaded area) correspond to smaller distances and therefore a smaller that ellipse (lightly shaded region) will have a larger error so that they are less relevant, omogeneity condition. Points outside the shaded regions are considered irrelevant. Fig. 1. The least-squares gradient algorithm uses data acquired in a set of points in space-time, represented here as a 2-dimensional space (x 1 , x 2 ). In this example, the data are obtained along the trajectories of three spacecraft (red dots on the dotted lines), although that does not matter for the method. The homogeneity condition is expressed by associating with each data point an error that grows with the distance from x 0 , the point where the gradient is computed. This distance, however, is measured in a frame (l 1 u 1 , l 2 u 2 ) that may be rotated and scaled relative to the original frame. Points on the ellipse with semi-axes l 1 and l 2 are assigned a unit distance. Points inside the ellipse (dark shaded area) correspond to smaller distances and therefore a smaller error. Points outside that ellipse (lightly shaded region) will have a larger error so that they are less relevant, thus reflecting the homogeneity condition. Points outside the shaded regions are considered irrelevant. Trotignon et al., 2003) because of their absolute calibration (Darrouzet et al., 2006b). (5) Additional errors arise due to the lack of synchronization of the spacecraft clocks, the nonzero duration over which data are gathered, and the uncertainties in the spacecraft positions. The impact of these errors may be hard to quantify, especially because they may not be statistically independent. Such errors tend to be relevant for short time scales and small separations only.
This paper describes an alternative way to determine the gradient obtained by relaxing the requirement of simultaneity of the observations. This is achieved by formulating the concepts of temporal and spatial homogeneity in a more general way. By using all observations in a region of space-time over which the spatial and temporal gradients are essentially constant over prescribed length and time scales, an overdetermined system of equations is obtained from which the gradient can be computed in a least-squares sense. In principle, data from an arbitrary number of spacecraft can be exploited. It is possible to attempt to compute gradients on length and time scales that match the physical scales of interest. In any given practical situation the method will find out whether such gradients can actually be computed with the available data. For in situ measurements by the Cluster spacecraft in a medium that is at rest, for instance, the scales along the spacecraft orbit will usually be limited by the time resolution of the data, while the scales perpendicular to the velocity will be dictated by the orbital separations. Proper error estimates are derived that account for the measurement errors, for the errors due to the fact that the gradient is not constant over the region in which measurements are available, and for the effect of small-scale structures and perturbing wave fields. It is also possible to take into account geometrical and spatiotemporal constraints.
In Sects. 2-5, the method is presented in a formal mathematical way. We use linear algebra techniques (such as eigen-decomposition and singular value decomposition) and therefore adopt the standard linear algebra notation: Bold lower-case symbols represent vectors, bold upper-case symbols are matrices, other symbols denote scalars. Sections 6-8 illustrate the application of the method for scalar fields, for vector fields, and for divergence-free vector fields with Cluster examples for a pass through the inner magnetosphere.

The problem for scalar fields
One or more spacecraft sample a scalar field f (x, t) at positions and times x i =[x i ; y i ; z i ; t i ], i=1, . . . , N , in 4dimensional space-time. The case of vector fields is discussed later. The measurements f i have known error variances δf 2 meas,i . Data intercalibration removes all systematic errors (we disregard clock synchronization and spacecraft position errors).
The gradient ∇ xt f =[∂f/∂x; ∂f/∂y; ∂f/∂z; ∂f/∂t] at a point x 0 can be computed by combining all measurements made inside a region in which the gradient does not change appreciably. This region is the "homogeneity domain". Its size is determined by physical considerations. It can be described by a 4-dimensional ellipsoid, that is, by four mutually orthonormal directions u j and the corresponding scales l j , specifying the axes of the hyperellipsoid. This is illustrated in Fig. 1 by means of an analogy in 2-dimensional space. In this example, the points (red dots) are obtained along the trajectories of three spacecraft (dotted lines), although that does not matter for the method. The points inside the homogeneity domain (represented by the dark shaded ellipse) can safely be used to compute the gradient in x 0 . Rather than accepting points inside the homogeneity domain for the computation and rejecting points outside, we will use a more gradual approach. With each data point, we will associate an error that grows with the distance from x 0 . This distance, however, is measured in a frame (l 1 u 1 , l 2 u 2 ) that may be rotated and Ann. Geophys., 25,[971][972][973][974][975][976][977][978][979][980][981][982][983][984][985][986][987]2007 www.ann-geophys.net/25/971/2007/ scaled relative to the original (x 1 , x 2 ) frame. This frame will be called the β-frame, and the unit coordinate vectors of this frame are exactly the semi-axes of the elliptical homogeneity domain. Points on the border of the homogeneity domain are assigned a unit distance. Points inside the ellipse correspond to smaller distances and therefore a smaller error. Points farther outside that ellipse have a larger error so that they are of progressively diminishing relevance, thus reflecting the homogeneity condition.
The transformation x β =Px, where P=(UL) −1 with orthonormal matrix U=[. . . u j . . .] and diagonal matrix L=diag(l j ) (this notation means: the diagonal matrix with the l j on the diagonal) represents the transition from the original frame to the new frame, denoted by superscript β, in which the homogeneity domain is the unit hypersphere. The corresponding gradient operator is ∇ The distance measure that is needed to express the homogeneity condition assigns to each vector x a length or norm Homogeneity considerations in time are often separable from spatial homogeneity.
If the time scale is l t =τ and if the three spatial homogeneity scales are l x =l y =l z =ξ , the transformation is simply a rescaling P=diag([1/ξ, 1/ξ, 1/ξ, 1/τ ]). The β-norm of a given vector x then is x β = (x/ l x ) 2 +(y/ l y ) 2 +(z/ l z ) 2 +(t/ l t ) 2 . We will refer to this particular case as the standard isotropic homogeneity case.
If f satisfies the appropriate analyticity conditions near x 0 , it can be locally approximated by a Taylor expansion. With x i =x i −x 0 , and denoting the function value and the gradient at x 0 by f 0 and ∇ xt f 0 , this becomes where the residual is r i =O( x i 2 ). This leads to a system of N equations for f 0 and ∇ xt f 0 , i.e., M=5 unknowns: where X=[. . . x i . . .] groups all relative positions and f denotes all measurements. The homogeneity domain can usually be chosen big enough so that N M: The system is overdetermined and can never be satisfied exactly. However, the gradient can be computed in a least-squares sense (minimization of r r). Expressing this system in the β-frame (the dimensionless form) is preferable from the numerical point of view (since all system matrix elements then are of order unity). Although system (2) is usually overdetermined, it may still be ill-conditioned. Such ill-conditioning can often be avoided by adding a priori knowledge in the form of constraints. In the case of Cluster it may become possible to compute gradients with only three, two, or even a single spacecraft, depending on the number of constraints. We will not use such constraints in the present paper, but Appendix A explains how they can be incorporated.

Approximation errors
In order to obtain the gradient, we approximate the scalar field f , with all its space-time variations, by a local linear approximation. There are two aspects to this approximation: The function might have variations at scales smaller than the specified homogeneity scale (this variability usually cannot be evaluated completely from the measurements, so a statistical approach is needed to estimate its effect), and there is also the curvature of f at the homogeneity scale itself, which is why the linear approximation is only valid inside the homogeneity domain. Both contribute to the total "approximation error"; the small-scale errors are the "fluctuation errors" and the errors due to the linear approximation are the "curvature errors". We will therefore write the scalar field as f =f hs +δf ls +δf ss , the sum of the linear field f hs at the homogeneity scale (i.e. the linear approximation), a deviation δf ls due to variations at larger scales, and a small-scale field δf ss .

Structure at large scales
Structure at large scales ( x β = x β ≥1) is properly represented by the Taylor expansion of Eq. (1). With the constant f c indicating how much f changes over the homogeneity domain due to the higher-order terms in the expansion (function curvature), the curvature error is estimated as so that the linear approximation is valid when x β i <1 but not much beyond that. The curvature error is completely determined by the homogeneity conditions through the β-norm. The user must specify the value of f c based on physical considerations. This may not be trivial, as will be discussed in Sect. 6 and in the Conclusions. Note that the value of f c is linked to the homogeneity lengths: halving the homogeneity lengths is equivalent to multiplying f c by a factor of four. In principle, the curvature errors at the various sampling positions are not statistically independent, but since nothing is known about them a priori, their cross-correlations are ignored here. This is justified even more so because, as explained in Sect. 5, only little weight will be given to data points far from x 0 for which the cross-correlations would be large.
If the system is intrinsically changing on a short time scale, shorter than the sampling time scale (time-separable case with l t <t sample ), successive measurements cannot be related to each other since the system has changed in between. The homogeneity condition therefore indicates that only simultaneous measurements can be used for computing the gradient. Indeed, data taken at a different time have a large curvature error because t sample / l t and therefore x β is large. When using only simultaneous data, the last row in X vanishes so that ∂f/∂t remains undetermined. For four spacecraft www.ann-geophys.net/25/971/2007/ Ann. Geophys., 25, 971-987, 2007 this corresponds to the classical spatial gradient computation Chanteur, 1998;Chanteur and Harvey, 1998).

Structure at small scales
Small-scale structure is often present, for instance in the form of small-amplitude waves or turbulence. Small-scale perturbations are usually under-sampled, so their influence on gradient precision must be characterized with a stochastic model. The δf ss (x) can be thought of as the superposition of individual perturbations, each with length scales λ j along mutually orthogonal directions, which we take to be the homogeneity directions u j for the sake of simplicity, and with amplitude δf ssλ (λ, x): Let the population of perturbations δf ss be characterized by typical length scalesλ.
We can then introduce a new reference frame γ , similar to frame β but with different axis scaling, which leads to a norm Restricting ourselves to distributions that are isotropic in γ -space and assuming that the perturbation amplitudes do not vary appreciably over the homogeneity domain, one has (δf ssλ (λ, x)) 2 =δf 2 λ (λ γ ) everywhere, with λ γ = λ γ and where the acute brackets identify the expected value for the population of perturbations. Because of the locality of the perturbations, δf ssλ (λ, x i )δf ssλ (λ, x j ) ≈δf 2 λ (λ γ ) when x γ ij = x j −x i γ λ γ and zero when x γ ij λ γ . Appendix B computes that, for a distribution with perturbation strength δf 2 λ (λ γ ) decreasing exponentially, the covariances are with f * 2 the total perturbation variance. The crosscorrelation is large between nearby points and vanishes as their distance exceeds the perturbation length. Better models are possible if the type of perturbation (e.g. a particular wave mode) is known a priori. In such cases it would be best to compute the gradients of several wave field components simultaneously, coupled through the wave relations.

The problem for vector fields
The gradients of the individual components of a vector field can be obtained by treating each component individually as a separate scalar field under the simplifying assumption that the curvature errors are not correlated (or that one does not know a priori how) and that the small-scale perturbations for the different components are uncorrelated as well (which is not really true if they are due to a particular wave mode). Different homogeneity parameters for each of the components (different u j , l j , f c ,λ j , f * ) could be used. Here, the discussion is limited to the case of identical values. The number of unknowns at each point is M=3×5=15. For divergencefree vector fields (such as the magnetic field) the gradients of the vector components must be computed simultaneously, subject to the constraint that the divergence vanishes, so that M=14.
Computing the curl and the divergence of the vector field poses an additional difficulty. As the divergence and each of the components of the curl are sums of terms of the same order of magnitude, but possibly with opposite sign, the relative error on the result can be larger than the relative errors on the individual gradient components, which themselves are differences of similar values and have a significant uncertainty.

Solving the overdetermined system
The overdetermined system (2) expressing N measurements of a scalar field (the number of equations) can be written as +f * 2 , in which the terms represent the independent contributions of measurement error, curvature error and fluctuation error. In addition, there are the cross-correlations δf 2 ij =f * 2 e − x γ ij . These estimates give the N ×N correlation matrix C f = δf δf , which is real, symmetric, and positive definite. It is strongly diagonal dominant if the δf meas,i and f c are large, if f * is small, or if theλ j are small. As N may be large, significant computing time and storage savings can be achieved by setting the off-diagonals δf 2 ij to zero if x γ ij is above an appropriately chosen limit, thus ignoring small cross-correlations (typically e − x γ ij <10 −4 ). The eigen-decomposition C f =W D 2 W is computed, where W contains the orthonormal eigen-vectors and where the non-negative eigen-values, which are denoted by d 2 i , constitute the diagonal matrix D 2 =diag(d 2 i ). This is trivial if C f is diagonal, i.e., when we do not consider small-scale fluctuations. Applying operator W on the left to the vectors r, f , and Aq in the overdetermined system (3), the errors on the transformed residuals are now statistically independent, so that the i-th equation represents an individual piece of information corresponding to a covariance d 2 i . A further operation by D −1 on the left normalizes the residuals so that they all have unit variance, by dividing each by its error estimate. The resulting weighted system is whereÃ=D −1 WA,f =D −1 Wf , andr=D −1 Wr. Performing the equivalent operations on the correlation matrix C f givesC f =D −1 WC f WD −1 =I: There are no crosscorrelations, and the variances are unity. The least-squares Ann. Geophys., 25,[971][972][973][974][975][976][977][978][979][980][981][982][983][984][985][986][987]2007 www.ann-geophys.net/25/971/2007/ method minimizesr r so that equations with less relevant information (d 2 i large) will hardly play a role. Large measurement errors and short-scale fluctuations reduce the weights, and data points outside the homogeneity domain have such important curvature errors that their weight is negligible, thus ensuring that the information used for computing the gradient comes from within the homogeneity domain. Consider the situation depicted in Fig. 1. Spacecraft 2 passes near x 0 and has three points inside the homogeneity domain (the dark shaded ellipse), the middle of which carries only measurement and fluctuation errors while the curvature error is small there. Spacecraft 1 does not have any point inside the homogeneity domain. Nevertheless, its data points just outside the homogeneity domain (in the light shaded region) will also appear in the overdetermined system. As their curvature errors are larger, their weights will be smaller. They therefore contribute only a limited amount of information in the computation of the gradient.
The least-squares solution is obtained by formally solving the overdetermined system as In practice, computingÃ Ã (an M×M symmetric positive semidefinite matrix) is avoided because it implies sums with many terms and therefore a potential loss of precision. The standard method is to compute a decompositionÃ=QR, where Q is a unitary N×M matrix and R is an M×M upper triangular matrix with the so-called economy-size QR-algorithm. One can then easily computeÃ Ã =R R. This symmetric M×M matrix can be inverted by computing its singular value decompositionÃ Ã =V S 2 V, where S=diag(s j ) are the singular values and V is unitary, so that The correlation of errors on the result is C q =(Ã Ã ) −1 , from which the errors on the result can be estimated as δq= diag(C q ) by ignoring the cross-correlations between the solution components. From q=[f 0 ; ∇ β xt f 0 ] and δq=[δf 0 ; δ(∇ β xt f 0 )], the gradient and the error estimates are found as As can be seen from Eq. (5), the solution is obtained by applying the transformation V S −2 V to the weighted datã A f . Since V is unitary, the conditioning of the problem is completely determined by the singular values s j . If a singular value is small, the propagation of errors in the associated direction is important. For spatial gradient computations using simultaneous data from the four Cluster spacecraft, the concepts of planarity and elongation of the spacecraft tetrahedron have been used as a diagnostic for the well-posedness of the problem (Robert et al., 1998a,b), which have the advantage of being easily visualized. The singular values, however, provide an abstract but general diagnostic that works for an arbitrary number of spacecraft and in the presence of constraints (Appendix A). We therefore define the condition number Even when the problem is close to singular (condition number small) the method produces a valid result, but the error estimates on the result (or, at least, on some of its components) will be large. As results with too large errorbars are useless, computations with cond<10 −5 are ignored. The technique described here is based on an unconstrained approximation of f . If the scalar field is strictly positive, it is best to apply the technique tof = log f , with δf =δf/f . The results are transformed back using ∇ xt f =ef ∇ xtf and When computing the gradients of the three components of a vector field B, supplemented by the condition ∇·B=0, the overdetermined system is In the particular situation in which all data have been acquired simultaneously, the coefficients of the time derivatives are all zero and no information about the time variations can be extracted. The corresponding unknowns can be removed from the system, so that M=4 for the gradient of a scalar field, M=12 for a vector field, and M=11 for a divergencefree vector field.

Gradients of a scalar field
In this section, we consider the passage of the four Cluster spacecraft through the inner magnetosphere on 7 August 2003, from 06:00-11:00 UT, with perigee around 08:05 UT. A subinterval of this passage has been studied earlier by Darrouzet et al. (2006b). We focus on the gradients of the magnetic field strength |B| obtained by the FGM magnetometer and of the electron density n e derived from the plasma frequency f p measured by the WHISPER instrument. The spacecraft separation distances were on the order of 200×400×1000 km in the GSE X, Y , Z directions near perigee. The spacecraft cross the inner magnetosphere from the south to the north. Figure 2 shows that |B| reaches a local minimum and n e a maximum around perigee, at a geocentric distance of about 4.53 R E . Near perigee, the spacecraft enter the outer regions of the plasmasphere (K p =2 + , down from 6 − one day before, indicating post-storm recovery, a The spacecraft separation distances were 200 × 400 × 1000 km in the GSE X, Y , Z directions near perigee. |B| reaches a local minimum and ne a maximum around perigee (C1 -black, C2 -red, C3 -green, C4 -blue).
The bottom scale gives the L-shell position of the center of the Cluster tetrahedron (for L < 10, elsewhere L cannot be determined accurately). situation in which the plasmapause typically is located rather close to Earth). The plasma encountered near perigee (between 07:40 and 08:40 UT) is of plasmaspheric origin, as indicated by the Cluster plasma spectrometers. CIS/CODIF on Cluster 4, for instance, detects He + and O + . The spectrometers also indicate corotating flow of a few km/s, although the measurements are not very precise since the instruments miss a major fraction of the cold plasma distributions due to spacecraft potential effects. The |B| profiles are smooth with minor variations at the begin and the end of the pass, when the spacecraft are outside the plasmasphere and sample higher L-values at higher magnetic latitudes, and where n e is low. FGM is very precise and well calibrated. Spin-averaged magnetic field data (4 s time resolution) are used here. The measurement error on the components is 0.1 nT while the uncertainty on |B| is 0.15 nT (< 0.05 %). These data appear to allow an accurate gradient determination since the magnetic field values registered by the four spacecraft at any given time differ by up to 10 nT, larger than the measurement errors. The measurement error on f p is the 163 Hz discretization error (half of the frequency resolution of WHISPER). Additional errors due to the possible misidentification of the plasma frequency line in the WHISPER spectrograms have been kept to a minimum as the algorithms for plasma frequency detection have matured (Trotignon et al., , 2003Rauch et al., 2006). The relation between density n e and plasma frequency f p is n e [cm −3 ] = (f p [kHz]/9.0) 2 , from which δn e /n e =2δf p /f p . The peak density is almost 70 cm −3 with an error of 0.3 cm −3 . The relative error is smallest for these high densities, typically 0.4 %, and increases up to 20 % for lower densities near the detection limit.
First consider the standard isotropic homogeneity case with a characteristic spatial scale l x =l y =l z =500 km. The homogeneity time scale depends on the time it takes structures to convect over these spatial scales as well as on intrinsic temporal changes. For typical plasmaspheric convection velocities of a few km/s, a homogeneity time scale l t =60 s seems to be a natural choice compatible with the spatial scale. This is also sufficient to resolve the intrinsic temporal behaviour of plasmaspheric refilling (hours) or electric field reconfigurations involved in the creation of medium and large structures (tens of minutes). The terrestrial dipole field strength at the equator is B=B eq /L 3 (where B eq is the equatorial field strength at the surface), so that d 2 B/dL 2 =12B eq /L 5 from which we obtain a typical curvature error estimate f c =12B eq (l x /R E ) 2 /L 5 ∼1 nT for l x =500 km and L=5. A limitation of the present approach is that only a constant f c can be specified, while in reality it can vary from point to point.  can be included in the weighted system for determining the gradient at a given point. As the weights of most points are negligible, however, it is computationally more efficient to include only those points with a relative variance satisfying where the σ k is a series of increasing threshold values. In the example sketched in Fig. 1 this threshold corresponds to the outer ellipse: All points included in the computation form a set that is larger than the homogeneity domain (assuming that the measurement and fluctuation errors for all points are the same, otherwise there exists no simple geometrical representation). In practice, we use 5 logarithmically spaced values σ k from 2 up to 100, for each of which the gradient is computed. The gradient computation is thus repeated with the outer ellipse being progressively expanded. The result is most precise for the largest threshold (largest N). Solving the problem for all σ k allows us to assess whether the largest threshold is too low (a better result could be obtained) or unnecessarily large (computationally inefficient). Figure 3 illustrates the computation of ∇ xt |B|. It has been determined every 60 s. The magnetic field strength at the moving center of the homogeneity domain is a spatio-temporal average of the observed values (panel a). In order to find the physical scales of this averaging process, note that the region from which the weighted least-squares method will take most of its information (where homogeneity-scale error ≤ measurement error + fluctuation error) has a diameter or effective scale size D j in the j -th homogeneity direction that can be computed from where S eff is called the effective scale factor. It is about 0.8 in this example (panel b). If δf meas,i , f c , and f * are of the same order of magnitude, S eff ∼ 1 so that the effective scales correspond to the homogeneity scales. A general property is that S eff becomes smaller as f c becomes larger, reflecting the equivalence between small homogeneity scales with a small f c on the one hand, and larger homogeneity scales with a larger f c on the other hand. The number of equations N actually used to compute the gradient (panel c, each curve refers to a σ k ) increases with σ k . A measure for the amount of information used is the effective number of equations N eff = i 1/ρ i , which is the sum of the inverse relative variances of the corresponding equations, so that N eff ≤N (panel d). For ever larger σ k , N eff increases, but the relative gain decreases as the added equations have progressively lower weights. The convergence of the N eff -curves indicates that max k σ k =100 is well-chosen. Figure 3e presents an overview of the condition number as defined by Eq. (6). For low σ k (N small, not necessarily using data from the four spacecraft) the problem is almost singular (cond∼10 −16 ), but for larger σ k the condition number becomes very reasonable, typically 10 −3 . It remains below 10 −5 in the time interval 09:30-09:45 UT, so that the gradient cannot be sensibly computed there. The growing errorbars on the magnitude of the spatial gradient, |∇|B||, correspond to this ill-conditioning (panel f), while no such large errorbars are present for the temporal gradient, ∂|B|/∂t (panel g). This ill-conditioning is due to a purely spatial effect, namely the near-coplanarity of the spacecraft during this period. Elsewhere, the spatial gradient is well computed, with a value around 0.04 nT/km. The errorbars are on the order of 10%, but growing where the condition number gets worse. The temporal gradient varies around zero with an amplitude of <0.01 nT/s. Medium-scale structures in the plasmasphere are often aligned with the magnetic field, especially at the rather low latitudes that Cluster is sampling here. It therefore makes sense to use an anisotropic homogeneity domain that is aligned with the magnetic field, with l x =l y =500 km perpendicular to B and l z =2000 km parallel to B. In addition, we may consider small-scale fluctuations. Their characteristic scales are taken to be 1/5th of the homogeneity scales (λ x =λ y =100 km,λ z =400 km,λ t =12 s) and f * =0.2 nT. (panels b, c, d and e, f, g, respectively). There is not much difference between the gradients themselves. Yet, more data points are available wherever the spacecraft trajectories are aligned with the direction of largest extent of the homogeneity domain (the magnetic field direction), i.e. near perigee. Consequently, N is larger there and the condition number improves (panel a). The error margins are smaller (down to 8% on the spatial gradient magnitude) because larger N implies more accurate averaging and because the total errors on the data are smaller. Adding fluctuations increases N and leads to a systematic improvement of the condition number. The error margins increase a little to about 10% on the spatial gradient magnitude. As discussed in Sect. 5, the least-squares method can also be used to obtain the traditional instantaneous spatial gradient. To this end, we first time-average and resample the data onto a common time scale (60 s resolution in the present case), so as to obtain simultaneous data points. Note that time-averaging requires an appropriate evaluation of the error margins. The error on a time-average f = n i=1 f i /n can be estimated by which takes into account both the measurement errors (of diminishing importance as the number of data points grows) and the time-variability of the observed quantity (assuming this variability to be gaussian, requiring an estimate of the standard deviation). Figure 5 compares the spatio-temporal gradient (anisotropic case with fluctuations) to the instantaneous spatial gradients for f * =0 nT and f * =0.2 nT (pan-els b, c, and d, respectively). For the 4-spacecraft Cluster case N=M=4. The weights then do not matter and the classical instantaneous spatial gradient method is recovered. The gradients obtained with or without fluctuation error are identical, but their error margins are not. There is a significant difference from the space-time gradient only near the coplanarity region. The condition number (panel a) is slightly different for the three computations. The error margins are smallest for the spatio-temporal gradient.
The gradient of the plasma density can be computed in similar ways. Figure 6 compares ∇n e and ∂n e /∂t obtained with f c =0.5 cm −3 for the isotropic case, for the anisotropic case, and for the anisotropic case with f * =0.2 cm −3 (panels c, d, e and f, g, h, respectively). As plasma density is a strictly positive quantity, a logarithmic scaling has been used. Computing gradients in data gaps is, in principle, always possible. However, the error associated with far-away data points will be large, depending on the homogeneity scales, so that the gradient will carry a large error. Often also the condition number becomes low. For instance, there is a data gap for the four spacecraft around 06:42. As it lasts only a few times the homogeneity time scale, the gradient can still be computed, but the condition number temporarily decreases (panel a). Something similar happens around 07:18 UT, due to a 5-min data gap for Cluster 2 only. Also note that the condition number has improved again for the anisotropic case near perigee, where the orbit is along the direction of largest extent of the homogeneity domain. The spatial gradient is well-computed except close to the coplanarity region around 09:40 UT. The error margins are larger there but still so small that they can hardly be seen in the figure. There is a double-peaked spatial gradient around 08:05 UT corresponding to the rising and falling slopes around the density peak observed near perigee (compare with panel b). There are also important gradients (∼0.05 cm −3 /km, relative precision 10%) near 07:50 UT and near 08:15 UT. These strong gradients are identical to those reported by Darrouzet et al. (2006b) and interpreted there as proof of azimuthal structure. As the densities drop farther away from Earth, the gradients tend to become negligible well before and after perigee. The temporal gradient is pretty small (<0.01 cm −3 /s with a relative precision of 20% at best), except for the bipolar structure near perigee, which corresponds to the convection of the density structure. The three results are almost identical, but the error margins are again smallest (<8%) for the case of an anisotropic homogeneity domain.
Comparing ∇ xt n e for the anisotropic case to the instantaneous spatial gradients without or with fluctuations (Fig. 7), the two instantaneous spatial gradients are necessarily found to be equal and there are only minor differences with the space-time gradient (panel c, d, and e). The condition number is not much different between the three computations (panel a), but it clearly is best for the space-time gradient. The space-time gradient also has the smallest error margins.

Gradients of a vector field
The gradients of the magnetic field vector components have been computed with the same anisotropic homogeneity domain, with f c =1 nT and f * =0.2 nT, treating the three  components independently or coupling them through the zero-divergence constraint. From them, ∇×B and ∇· B are obtained, as shown in Fig. 8. The effective scale factor is close to unity (as for the ∇ xt |B| computation in Sect. 6). S eff , N , and N eff are identical for both computations. The number of equations N (panel a) is three times larger than for a scalar field. Both N and N eff (panel b) peak near perigee as a consequence of the alignment between the spacecraft orbits and the direction with the longest homogeneity scale length (the magnetic field direction). The condition number is identical for both computations (panel c), although the overdetermined system sizes are different (same data used, but only 14 unknowns in the divergence-free case, rather than 15). The progressively deteriorating condition number toward 09:40 UT reflects the spacecraft coplanarity issue. The condition number is quite good elsewhere. The curl (panels d and e) and the divergence (panels f and g) from the uncoupled and the divergence-free computations have absolute error margins of typically 0.005 nT/km (a relative error of 10% on |∇×B|). Away from the coplanarity time interval, ∇·B does not significantly deviate from zero. The correction of the solution implied by requiring ∇·B=0 is rather small. While the leastsquares method minimizes the differences between the observations and the linear approximation, adding a constraint limits the solution search space so that the error margins become slightly larger. Nevertheless, adding physically relevant constraints obviously improves the realism of the solution. Figure 9 compares the curl (panels b, c, and d) and divergence (panels e, f, and g) for the divergence-free spatiotemporal gradient computation as well as for the divergencefree instantaneous gradient computation, applied to 60 s averaged data, without and with small-scale fluctuations, respectively. While the results of the computations are essentially the same, the condition number (panel a) is best for the spatio-temporal gradient. It occasionally drops below 10 −5 for the instantaneous gradient without fluctuations. The error margins are smallest for the spatio-temporal gradient despite the fact that only the spatio-temporal gradient error estimates account for the three error sources (measurement error, curvature error, and fluctuation error).

Physical relevance
In order to illustrate the usefulness of these gradients for scientific analysis, Fig. 10a shows the angles α B,∇|B| and α B,∇n e between the ambient magnetic field B on the one hand and ∇|B| and ∇n e on the other hand, in blue and red, respectively (anisotropic case with fluctuations). The Cluster spacecraft are sampling the outer regions of the plasmasphere, which happen to be the most dynamic ones where erosion can be important (e.g., Carpenter and Lemaire, 1997;Lemaire and Gringauz, 1998;Carpenter and Lemaire, 2004;Décréau et al., 2005). Cluster has therefore been used intensively to study these regions (e.g., Darrouzet et al., 2004, (g)   2006a,b). As we are only interested in the direction of these gradients, not their sense, the angles are reduced to the interval [0 • , 90 • ]. By definition, the magnetic field strength gradient is perpendicular to B at the magnetic equator, corresponding to α B,∇|B| =90 • , near 08:02 UT (close to, but not exactly at Cluster perigee). Before and after that time, as the Cluster spacecraft are at higher magnetic latitudes, that angle decreases rapidly because of the progressively more important field-aligned gradient. The error margins are large away from perigee as both B and ∇|B| are small there. Angle α B,∇n e remains quite large throughout the time interval, indicating that ∇ || n e ∇ ⊥ n e at the relatively low latitudes Cluster is sampling, something that has also been found with radio sounding techniques (Reinisch et al., 2001). This is due to a small longitudinal gradient within each flux tube, but also because of the existence of strong radial and azimuthal density structure on the transverse homogeneity scale of 500 km adopted here. The gradient orientations ob-tained here compare very well to the instantaneous gradient directions reported by Darrouzet et al. (2006b).
Assuming that there are no time changes in the electric field, the current density j =∇×B/µ 0 can be readily computed from the curl of the magnetic field. The angle α B,j between B and j as well as the current density magnitude |j | are given in Figs. 10b and c (anisotropic divergence-free vector case with fluctuations). The current density vector j is perpendicular to B somewhat northward of the magnetic equator, around 08:10 UT. Close to perigee, α B,j is <90 • south of the equator, >90 • north of it, with a current density of 30 nA/m 2 . The existence of field-aligned currents, away from the equator itself, is clearly established in the plasmasphere but also on auroral field lines (e.g., just after 06:00 UT). For a dipolar magnetic field, j ≡0, but this is definitely not true in the present situation.

Conclusions
This paper describes a general-purpose method for computing gradients of scalar and vector fields in space and time. It has been shown that (1) The weighted least-squares method for computing gradients is a very robust one.
(2) The method provides reliable error estimates that include the effects of measurement errors and approximation errors due to structure at scales that are larger and/or smaller than the physical scale of interest.
(3) The method provides diagnostics to assess the quality of the computation, in particular by monitoring the singular values of the problem as a generalization of the concepts of planarity or elongation of a 4-spacecraft configuration. The role of the different parameters of the gradient computation algorithm has been illustrated. The relative importance of the different types of errors and their effect on the quality of the results has been discussed.
The method has been found to be superior to the traditional instantaneous gradient computation. Its primary advantage is its generality and its robustness. It correctly applies the principle of locality of information since only local data are used to compute the gradient at any given point, in accordance with the homogeneity condition. It also yields more stringent error margins on the obtained gradients. A disadvantage is its mathematical complexity. Implementing the method is not trivial. Computing the gradients is time-consuming when one considers small-scale fluctuations (f * =0), because then the (possibly large) error covariance matrices must be diagonalized. While the gradients obtained with this new method typically do not differ very much from those obtained with the traditional instantaneous gradient method, one now obtains a quantitative estimate of the total error on the results.
A prerequisite for a correct application of this method (and of any other method) is the ability to specify realistic values for the different error contributions. The measurement error is usually well-known, the fluctuation error is often only a minor correction, but providing an estimate for the curvature error may be more difficult. A posteriori verification, . The magnetic equator corresponds to α B,∇|B| =90 • , near 08:02 UT. Both angles reflect the relative proportion of parallel and perpendicular gradients; ∇ || |B| is rather strong away from the equator, while ∇ || n e ∇ ⊥ n e due to small longitudinal gradients within each flux tube and because of radial and azimuthal density structure. (b) Angle α B,j between B and current density j (where j =∇×B/µ 0 in a steady situation). (c) Current density magnitude |j | is significantly different from zero in the plasmasphere, indicating deviation from a dipolar field. The current density vector contains a significant field-aligned component inside the plasmasphere (around perigee) and also on auroral field lines (just after 06:00 UT). however, is always possible. Once the gradient is computed along the spacecraft trajectory, one can check how it changes with position and/or with time, at least to a certain extent, so that an evaluation can be made of the curvature error. As long as there are enough points within the homogeneity domain, the value and the precision of the gradient are determined mainly by the measurement and fluctuation errors, and the exact value of the curvature error is not too important. A limitation of the present method is that a single, fixed value for the curvature error parameter f c is used throughout the domain. Another limitation is that we have not accounted for timing errors or spacecraft position errors.
As an illustration, this method was used to analyze magnetic field and plasma density data obtained by Cluster during a pass through the inner magnetosphere. The relative importance of the perpendicular and field-aligned gradients of the magnetic field strength and of the plasma density have been discussed. In addition, nonzero current densities have been found, indicating that the field is not dipolar. Fieldaligned currents appear to exist in the outer regions of the plasmasphere and on auroral field lines. The correct evaluation of the error margins on the gradients offered by the proposed method is absolutely necessary to ascertain the reliability of these findings.
The homogeneity scales must be adapted to the physical structures that one intends to study. In the present example, with a density structure that does not seem to exhibit too fine scales, homogeneity lengths of a few hundreds of kilometers and a time scale of 1 min were fine. In situations of stronger geomagnetic activity, finer-scale plasmaspheric structures may be formed more rapidly, necessitating smaller homogeneity scales in space and time. The least-squares Ann. Geophys., 25,[971][972][973][974][975][976][977][978][979][980][981][982][983][984][985][986][987]2007 www.ann-geophys.net/25/971/2007/ method will always produce a result, but whether the computed gradients are accurate depends on the nature of the data. With Cluster, a good gradient can usually be obtained when the homogeneity scales are on the order of, or larger than, the spacecraft separations in space and time.

Constraints
The overdetermined system (2) may still be ill-conditioned as the redundancy that stems from repeatedly measuring the same quantity over time on different spacecraft may be rather limited. For example, if the spacecraft are all in the same orbital plane, it is impossible to extract information about variations in the direction perpendicular to that plane. This ill-conditioning can be avoided if one adds new information about the problem in the form of constraints. We discuss here two types of (linear) constraints that may be very useful in practice: geometrical ones, which state that one or more spatial gradient components are zero, and the stationarity constraint, which specifies that the total time derivative is zero.

A1 Geometrical constraints
Geometrical constraints are introduced by specifying an orthonormal set of vectors c j , j =1, . . . , m (m≤3) to which ∇ xt f 0 must be perpendicular. For example, the gradient might be required to be perpendicular to the local magnetic field vector B. In that case, m=1 and c 1 =B/ B . A set of orthonormal vectors d i , i=1, . . . , 4−m can then be constructed, so that d i c j =0. Transforming any vector as x β =[. . . d i . . . c j . . .] x, the gradient itself becomes [. . . d i . . . c j . . .] ∇ xt f 0 . Since c j ∇ xt f 0 =0, the m last components of the gradient vanish. The directions c j can thus be regarded as homogeneity directions corresponding to infinite homogeneity scales, since the gradient must be invariant (identically zero) in each of those directions, so that U=[. . . d j . . . c j . . .] and L=diag([. . . l j . . . + ∞...]). Transformation P is now a projection rather than a rotation and scaling (the space spanned by the c j is its null space), its target space being (4−m)-dimensional. The constraint generally improves problem conditioning, but leads to larger residuals as there is less freedom to mimimize r r.
As an example, consider the situation in which the gradient direction in space is known, say, that it lies along x. which leads to a projection x β =x/ξ , t β =t/τ .

A2 Stationarity constraint
For a structure moving with a given constant velocity v 0 , time-stationarity is expressed by This constraint corresponds to an infinite homogeneity scale along direction u 4 ∝[ v 0 1 ]. Homogeneity in time is not separable from homogeneity in space unless v 0 =0.
As an example, consider v 0 =[v 0 ; 0; 0]. Homogeneity directions u 1 , u 2 , and u 3 must form an orthonormal set together with u 4 . One particular choice produces where ξ is a length scale. The projection turns out to be x β =(x−v 0 t)/ξ , y β =y/ξ , z β =z/ξ , so that one actually computes the spatial gradient with isotropic homogeneity length scale l x =l y =l z =ξ in a reference frame that moves with v 0 .
Because of the locality of the perturbations, δf ssλ (λ, x i )δf ssλ (λ, x j ) ≈δf 2 λ (λ γ ) when x γ ij = x j −x i γ λ γ and zero when x γ ij λ γ . For simplicity, the switch between both situations is taken to be an abrupt one. The covariances of δf ss at points x i and x j can then be computed as δf ss (x i )δf ss (x j ) where f * 2 is the total perturbation variance. Whatever the specific choice of perturbation amplitude distribution, it must decrease faster than 1/(λ γ ) 3 in order to obtain a finite total perturbation, and the end result will always be that the crosscorrelation is large between nearby points, and becomes zero as the distance between both points exceeds the perturbation length scale.