Averaging and sampling for magnetic-observatory hourly data

A time and frequency-domain analysis is made of the effects of averaging and sampling methods used for constructing magnetic-observatory hourly data values. Using 1min data as a proxy for continuous, geomagnetic variation, we construct synthetic hourly values of two standard types: instantaneous “spot” measurements and simple 1-h “boxcar” averages. We compare these average-sample types with others: 2-h average, Gaussian, and “brick-wall” low-frequencypass. Hourly spot measurements provide a statistically unbiased representation of the amplitude range of geomagneticfield variation, but as a representation of continuous field variation over time, they are significantly affected by aliasing, especially at high latitudes. The 1-h, 2-h, and Gaussian average-samples are affected by a combination of amplitude distortion and aliasing. Brick-wall values are not affected by either amplitude distortion or aliasing, but constructing them is, in an operational setting, relatively more difficult than it is for other average-sample types. It is noteworthy that 1-h average-samples, the present standard for observatory hourly data, have properties similar to Gaussian average-samples that have been optimized for a minimum residual sum of amplitude distortion and aliasing. For 1-h average-samples from medium and low-latitude observatories, the average of the combination of amplitude distortion and aliasing is less than the 5.0 nT accuracy standard established by Intermagnet for modern 1-min data. For medium and low-latitude observatories, average differences between monthly means constructed from 1-min data and monthly means constructed from any of the hourly average-sample types considered here are less than the 1.0 nT resolution of standard databases. We recommend that observatories and World Data Centers continue the standard practice of reporting simple 1-h-average hourly values.


Introduction
Since magnetic observatories first began operating around the world in the middle of the 19th century, one of their most important products has been hourly data values. Historical hourly observatory data are useful for a variety of applications (e.g. Parkinson, 1983;Courtillot and Le Mouël, 1988;Prölss, 2004), including studying geomagnetic secular variation originating in the core (e.g. Sabaka et al., 2004), exploring the electrical conductivity of the mantle (e.g. Egbert et al., 1992), mapping electric currents in the ionosphere (e.g. , measuring the intensity of magnetospheric storms (e.g. Karinen and Mursula, 2005), and estimating long-term solar-terrestrial interaction (e.g. Macmillan and Droujinina, 2007). In conducting these, and other types of analyses, it is important, and sometimes even essential, to have an understanding of the effects of amplitude distortion and aliasing caused by averaging and sampling procedures used in the production of hourly data.
The means of acquiring hourly observatory data have evolved with the advancement of measurement technology and in response to researcher demands for data that meet increasingly stringent quality standards. The oldest observatory hourly data were obtained by on-site personnel using instruments that permitted essentially instantaneous visual measurement of magnetic-field direction and intensity. With the introduction of photographic recording systems , the process of acquiring hourly data became semi-automated. Following a daily schedule, an observatory worker would develop the photographic paper, and, then, using an etched piece of glass, directly measure the amplitudes of the time series traces recording magnetic-field variation. At first these were "spot" measurements, similar, in some respects, to direct instantaneous visual measurements. In the early 20th century, following the recommendation of Schmidt (1908), observatories began to report hourly means, with 1-h average values recorded once per hour (Chapman and Bartels, 1962, Ch. 2.14). These were either directly estimated from the paper photographic records, Published by Copernicus Publications on behalf of the European Geosciences Union.
or, more formally, they were calculated from multiple subhour spot measurements. The switch from making spot measurements to producing average values came gradually: in Great Britain (Eskdalemuir) it was made in 1912 (Meteorological Office, 1914, p. 68), in the United States at all observatories in 1915 (e.g. Hazard, 1918, p. 5), and at some observatories in France (Chambon la Forêt) not until 1972 (Fouassier and Chulliat, 2009, p. 87). Today, constructing hourly means is done by computer, averaging digital 1-min data acquired by electronic systems.
Because the historical hourly observatory data are a mixture of spot measurements and hourly averages, their properties have changed over time, and their time series can be described as being "inhomogeneous". Spot sampling can sometimes, and fortuitously, give a good record of shortduration transient magnetic-field variation. Spot sampling can also sometimes, and unfortunately, miss transient variation altogether. It all depends on when and how frequently samples are taken relative to when and over what duration the field variation occurs. If we assume, as is reasonable, that spot measurements are collected independently of field variation, then numerous spot measurements collected over a long period of time will represent an unbiased statistical sampling of the amplitude range of magnetic-field variation. On the other hand, averaging reduces the amplitude of highfrequency variation, giving a time series that is smoother than actual magnetic-field variation. The statistical variance of discrete 1-h-average values will be less than that of the magnetic field's actual variation, and it will be less than that of instantaneous, spot measurements. Since magnetic-field variation occurs over an extremely broad range of frequencies, discrete hourly sampling, be it done with spot measurement or with an averaging window, will result in aliasing, with high-frequency amplitudes being mapped into estimates of low-frequency amplitudes.
Researchers often use a combination of spot and hourlyaverage observatory data, as if, together, they represent a reliable long-term record of magnetic-field variation. It is, therefore, natural to ask: can we measure differences in the properties of hourly spot and hourly average values so that informed decisions can be made about using them? How do spot and hourly averages compare with other hypothetical average-sample types that might be proposed in the future as candidate observatory products? In seeking answers to these and other related questions, we examine the properties of hourly observatory data, measuring their variance, relative proportionality, correlation, autocorrelation, spectral power, amplitude distortion, and aliasing. Assuming that 1-min data are a good representation of continuous magnetic-field variation, we construct several types of synthetic hourly averagesamples from 1-min data collected from observatories situated at different latitudes. We employ standard methods of continuous (e.g. Kanasewich, 1981) and statistical (e.g. Lee, 1960; timeseries analysis.

Data
In Table 1 we summarize the data we use. The 1-min data are definitive (processed and calibrated) magnetic-vector values collected at observatories (Jankowski and Sucksdorff, 1996;Love, 2008) that meet Intermagnet standards (Kerridge, 2001;Rasson, 2007); we obtained the 1-min data from the Intermagnet website (www.intermagnet.org) for observatories situated at high (Barrow BRW), medium (Chambon la Forêt CLF), and low (Huancayo HUA) geomagnetic latitudes. They record a variety of magnetic-field variation, with high (low) latitude variation dominated by active auroral (equatorial) electrojets (e.g. Parkinson, 1983;Prölss, 2004), and medium-latitude variation being relatively more quiescent. Each 1-min magnetic-vector datum was formed from sub-minute electronic signals acquired from a tri-axial fluxgate magnetometer. The signals were both analog and digitally filtered, effectively eliminating aliasing from variation with periods of less than 1-min (frequencies greater 0.0167 Hz). Additional measurements, made once a week or so from a reference pier at each observatory site, were combined with the fluxgate measurements to construct time series that have an accuracy, measured in absolute terms over many years, of better than 5.0 nT (0.50 ). Each 1-min datum has a time stamp that is centered on the top of the universaltime (UT) minute (HR:MN:SC, 00:00:00, 00:01:00, etc.). For BRW and HUA, we use almost a complete solar cycle of data (1998.0-2009.0), and for CLF we use almost two solar cycles of data (1991.0-2009.0).
The hourly values we use were acquired at Chambon la Forêt (CLF) for the years 1954.0-1990.0 (more than three solar cycles) and at Eskdalemuir (ESK) for the years 1917.0-1947.0 (almost three solar cycles). These are available in computer-readable format from the website of the World Data Center "CE" in Copenhagen (now Edinburgh), having been either transcribed from printed observatory yearbook values or constructed from digital minute data. Prior to 1972Prior to .0 (1932 the CLF (ESK) data are spot samples (1-h means) centered on the top of the Greenwich hour (00:00:00, 01:00:00, etc.) (Fouassier and Chulliat, 2009;Meteorological Office, 1933). After 1972.0 (1932.0) they are, for both CLF and ESK, 1-h means centered on the bottom of each hour (00:30:00, 01:30:00, etc.) (Fouassier and Chulliat, 2009;Meteorological Office, 1934), as is now the standard among other observatories. Martini and Mursula (2006) and Svalgaard and Cliver (2007) have noted that the WDC-CE pre-1932 ESK holdings are an interpolation of hourly values centered at the top of each hour (00:00:00, 01:00:00, etc.)adjacent values in time have been averaged together to give 2-h average-samples, one per hour and centered on the bottom of the hour 1 . We use the CLF-CE data to investigate For both the 1-min data and hourly values, we use geographic-polar (horizontal H , declination D, vertical Z) magnetic-vector components. When these are the components in the WDC database, then we use them without any modification, otherwise, if and when cartesian (north X, east Y ) horizontal components are reported, then we calculate (D,H ) appropriately. Data spikes are identified with a simple computer algorithm; these are removed and, along with occasional data gaps, filled with interpolated values. The British Geological Survey's annual-means database (www.geomag.bgs.ac.uk) contains a list of small step-offsets in the historical time series; some are also documented in observatory yearbooks. Most step offsets are obvious from inspection of the data; they are the result of moving the observatory's reference pier, changes in measurement method, or uncontrolled contamination. We insert complementary stepoffsets into the historical observatory time series in order to bring them into continuity. The details of these adjustments do not significantly affect the results that follow.

Theory
The standard averaging and sampling methods that are used for constructing observatory hourly values are mathematically equivalent to passing a continuous magnetic-field time series through a zero-phase-delay linear filter, and, then, discretely sampling the output once per hour. In this section of mathematical review, we examine the properties of different averaging and sampling types in the dual domains of time and frequency. The average-samples are simple, and they include the standard spot and 1-h averages used for constructing hourly values, as well as other average-sample types useful for comparison. yearbooks record data in XY Z coordinates, but the pre-1932.0 WDC-CE holdings are in H DZ coordinates -another indication that the data were manipulated at some point after first reporting. Furthermore, the pre-1932.0 ESK digital holdings of WDC Kyoto are different from those of WDC-CE. Indeed, the WDC-K data appear to be close to the original yearbook data, but there are some formatting problems with the WDC-format version of the ESK-K data.
For notation, we represent the time series of an arbitrary magnetic-vector component as B(t), which is a function of time t, and which can stand for any of H (t),D(t),Z(t), etc. We assume that B(t) is well behaved: it is finite, continuous, and integrable. These qualities pertain to any natural magnetic-field time series measured over a finite duration of time and that might be generated from classical physical processes. Therefore, once the data have been detrended, a Fourier-type analysis can be reasonably pursued. We use a Fourier transformation F having a "unitary" normalization for ordinary frequencies f , where t = 1/f . For a signal B(t) in the time domain, its dual in the frequency domain,B(f ), is given bỹ (1) Inverse Fourier transformation is given by The unitary normalization is used in the classic textbook by Bracewell (1978, p. 6); it is different from the non-unitary normalization for angular frequencies used by Lee (1960, p. 33). In most respects, this is a technical distinction, one that does not affect a qualitative understanding of the theory and results that follow.

Simple running averaging
We begin by considering the straightforward running average of a geomagnetic time series. With a rectangle-in-time or "boxcar" function, simple averaging is defined over a duration of time t a by convolution of with the natural, continuous magnetic-field time series B(t), www.ann-geophys.net/28/2079/2010/ Ann. Geophys., 28,[2079][2080][2081][2082][2083][2084][2085][2086][2087][2088][2089][2090][2091][2092][2093][2094][2095][2096]2010 Fourier transformation of the rectangle function in the time domain is a sinc function in the frequency domain, (Bracewell, 1978, p. 389), where f a = 1/t a . By the convolution theorem (Lee, 1960, p. 28;Bracewell, 1978, Ch. 3), the Fourier transform of convolution in the time (frequency) domain is equal to multiplication in the frequency (time) domain, With this, whereB(f ) is the frequency-domain dual of B(t). The function sinc(f ;f a ) gives the mapping fromB(f ) toB a (f ); it is a "frequency-response" function (Bracewell, 1978, p. 179;Kanasewich, 1982, Ch. 5.3) that describes the amplitude distortion of the time series in the frequency domain due to averaging. It is useful, now, to define the amplitude differences which measure the signal in the original, continuous time series that is not modelled by the running average. For a geomagnetic reference to related issues, see Chapman and Bartels (1962, Ch. 16.17).

Low-pass brick-wall filtering
A low-pass "brick-wall" filter is most easily defined in the frequency domain (Bracewell, 1978, p. 52;Kanasewich, 1981, Ch. 15): it has a flat response below some chosen truncation frequency f b /2, and it admits nothing above that frequency. Because the brick-wall filter has these properties, it is sometimes described as "ideal". It is prescribed by multiplication of the Fourier amplitudes by a rectanglein-frequency functioñ and, correspondingly, by convolution in the time domain with a sinc function, The difference between the original, continuous time series and the filtered time series is equal to unresolved frequencies above f b /2, but there is no difference for frequencies below f b /2. Brick-wall filtering has two potential drawbacks. First, in the time domain, B b (t) will show oscillations or "ringing" near discontinuities or abrupt changes in the original, unfiltered time series B(t) (Bracewell, 1978, p. 209;Kanasewich, 1981, p. 240). This is the Gibbs phenomenon. It is usually illustrated in textbooks by convolving a sinc function with a Heaviside step function; since the sinc function has oscillatory tails, the output of the convolution is also, to some degree, oscillatory. While natural geomagnetic time series have no discontinuities, they do sometimes record abrupt variation, such as during storm sudden commencements. For this reason, the representation of a geomagnetic time series in terms of a brick-wall-truncated Fourier expansion will have ringing, which can be reasonably described as a Gibbs effect. But as we shall see, this does not significantly affect individual hourly samples of a brick-wall-filtered time series.
The second potential drawback of brick-wall filtering concerns its practical implementation. In comparison to the other average-sample types considered here, it is relatively more difficult to use brick-wall filtering to obtain hourly values. Accurate results require a long time series to be treated in the frequency domain. In contrast, simple boxcar averaging only requires one or two hours of data, and these can be treated in a straightforward way in the time domain; long time series are not needed to obtain accurate results. This simple point is especially relevant for institutes needing to promptly provide data for real-time operational applications.

Gaussian filtering
In the time domain, a Gaussian filter is defined as for which time-domain convolution is given by The Fourier transform of a Gaussian is another Gaussian, (Bracewell, 1978, p. 386), where f σ = 1/t σ . In a qualitative sense, this time-frequency symmetry places the Gaussian filter midway between the two extremes of time-domain (frequency-domain) rectangle (sinc-function) filtering and sinc-function (brick-wall) filtering. The width of the Gaussian filter is specified by t σ , and, in Sect. 5.7, we will locate an optimal value. We note that Intermagnet recommends a Gaussian filter for the production of 1-min averages, but they do not make any similar recommendation for hourly values.

Spot sampling
For a constant sampling frequency f s with discrete sample times that are equally-spaced with t j +1 −t j = t s = 1/f s , we can define a Dirac comb function as a sequence of equally-spaced delta functions, (Bracewell, 1987, p. 77;Kanasewich, 1981, pp. 34, 110). With this, instantaneous or "spot" sampling of a time series can be described as a multiplication of the original, continuous time series by the Dirac comb, Bracewell, 1978, p. 78;Kanasewich, 1981, p. 35). Since the Fourier transform of a Dirac comb is another Dirac comb, in the frequency domain, convolution results in the superposition of an infinite sequence of replicas ofB(f ), each one shifted by a multiple of the sampling frequency, For a continuous time series defined across an infinite range of frequencies, it is impossible to fully resolve its frequency content with a discrete set of spot samples, even if we have an infinite number of them. This limitation is the result of "aliasing", and it can be understood from consideration of Eq. (19). Harmonic amplitudes for frequencies below, for example, f s /2 will have mapped onto them the amplitudes for frequencies above f s /2, something that results in contamination of estimates of the frequency content of the original time series. More specifically, consider a single Fourier harmonic with frequency 0.3f s . With discrete sampling, it cannot be distinguished from a harmonic with frequency 0.7f s , or one with a frequency of 1.3f s , etc. There are an infinite number of aliases, each having a different frequency. If a continuous time series contains only a compact range of frequencies, such thatB(f ) is zero outside of the band (19) are nonoverlapping. In this case, there will be no aliasing and Shannon's theorem (Bracewell, 1978, Ch. 10;Kanasewich, 1981, p. 117) applies: sampling at twice Nyquist, f s = 2f N , is sufficient to make a complete reconstruction of the original, continuous time series. To see this, consider the frequencylimited version of Eq. (19), IfB(f ) has no frequency content above Nyquist, then only the k = 0 term contributes to the summation, in which case, In the time domain, the dual of Eq. (20) is the Whittaker-Shannon "cardinal" formula (Bracewell, 1978, p. 413), The presence of the rectangle function in Eq. (20) makes clear the noteworthy property of Whittaker-Shannon interpolation: it does not introduce any spurious amplitudes with frequencies above Nyquist, and it does not change any amplitudes with frequencies below Nyquist. For these reasons, it is sometimes called an "ideal" interpolator. IfB(f ) has no frequency content above Nyquist, then we can infer from Eq. (21) that A limited-frequency, continuous time series is, thus, reconstructed from spot samples by Whittaker-Shannon interpolation.
The situation of interest, here, is slightly messier. A continuous magnetic-field time series B(t) has variation across a range of frequencies that is so broad that it might as well be considered infinite. But for a sampling given by (t s ,f s ), we cannot resolve amplitudes with frequencies above Nyquist. That is, we are unable resolve the following signal Aliasing is given bỹ and, although non-standard, it is useful to view aliasing in the time domain, This has the spot values which might be interpreted as "errors" in a frequency-limited representation of the original time series. For a qualitative discussion of aliasing in the context of geomagnetism, see Chapman and Bartels (1962, Ch. 16.14).
As with the amplitude differences defined by Eqs. (8) and (11), it is useful to define the residual differences between the original, continuous time series and the Whittaker-Shannon interpolation of spot samples, where B s (f ) and B s (t) are given by Eqs. (20) and (22). These residuals are equal to the difference between the unresolved, high-frequency signal and that of aliasing, from Eqs. (24)-(27).

Spot sampling of a running average
Combining results, discrete average-sampling of a continuous time series is given by Its time-continuous Whittaker-Shannon interpolation is and The frequency-domain residual difference is given bỹ The unresolved, high-frequency signal is given by Eq. (24). The low-frequency residual, is the difference between amplitude distortion, whereÃ a (f ) is given by Eq. (8), and aliasing, which should be compared with Eq. (26). Each of these amplitude distortion and aliasing terms can, of course, be expressed in the time domain as well.
The output of average-sampling depends on the chosen t a and t s (or, equivalently, f a and f s ) and the details of the continuous time series itself. The standard practice of reporting 1-h-averages, once per hour, corresponds to t a = t s (f a = f s ). We have often heard it said that such average-samples are contaminated by aliasing, and that this might be remedied by, for example, increasing the averaging time from 1-h to 2-h, corresponding to t a = 2t s (2f a = f s ). But since there is considerable natural geomagnetic activity at periods below 2 h, heavier averaging will also result in significant amplitude distortion. Therefore, it is important to consider the combined effects of averaging and sampling on the continuous time series of interest: prominent Fourier amplitudes having periods that are much longer (shorter) than the averaging duration will not be (will be) significantly affected by amplitude distortion. If high-frequency Fourier amplitudes are relatively very small (very large), such as for what is often called a "red" ("blue") spectrum, then aliasing might not be (could be) very important. See Kirchner (2005) for a discussion of the effects of averaging and sampling a time series having 1/f noise.

Numerical analysis
To decompose observatory data in terms of Fourier series, we first detrend each time series by subtracting a slowly changing, non-periodic trendline. The physical origin of this timedependent trendline is the superposition of crustal magnetization beneath each observatory, which affects the average value of the trendline, and the time-varying main field sustained by the core's dynamo, which affects the average value and the time dependence of the trendline. For each observatory, we approximate this internal-field time series by Chebyshev polynomials of the first kind (Press et al., 1992, "chebev"). There is precedence for this choice of basis function in geomagnetic analyses (e.g. , made because of the rapid rate with which Chebyshev expansion coefficients converge when approximating smooth functions (e.g. Conte and de Boor, 1980). A low-degree truncation of the Chebyshev expansion is fitted to the observatory data using a least-squares algorithm. Results are not, however, particularly sensitive to the truncation level (we choose one Chebyshev degree per 2 years of data), since the hourly samples are "high frequency" compared to the shortest characteristic timescale of the subtracted trendline.
We make transformations back and forth between the time and frequency domains by computer application of a fast-Fourier transform (Press et al., 1992, "realft"). This algorithm uses N B = 2 N data (amplitudes) in the time (frequency) domain, where N is an integer. The Fourier transformation of a discrete data set can be represented as In the time domain, the discrete data are assumed to be evenly-spaced, with a sampling interval t s = 1/f s . In the frequency domain, the amplitude pairs Ann. Geophys., 28, 2079-2096, 2010 www.ann-geophys.net/28/2079/2010/ correspond to sine and cosine functions (denoted by the s and c prescripts), with the evenly-spaced discrete frequencies As an approximation of continuous magnetic-field variation at each observatory, we use 1-min data. From them we synthesize approximate average-samples that are similar to historical (or possibly, prospective) reported observatory hourly values. But in order to facilitate comparison of results with the source 1-min data in both the time and frequency domains, instead of sampling the source 1-min time series once per hour, we sample them once every 64 = 2 6 min. This means that for N B 1-min data, the first N B /64 Fourier amplitudes can be directly compared with the N B /64 Fourier amplitudes from the "hourly" samples -they correspond to the same harmonics over the same total duration of time. For our purposes, this 64-min "hour" sampling period is sufficiently close to a normal 60-min hour. Five different synthetic average-sample types, each constructed from the source minute data, are considered: 1. "Spot" samples, with no actual averaging, 2. 1-"hour" rectangle averages from t 1 = 65 min of data, 3. 2-"hour" rectangle averages from t 2 = 129 min of data, 4. Gaussian averages with t σ = 55 min, 5. Brick-wall (sinc) averages, In each case, summation over 1-min data is denoted by the subscript m. The subscript j can, in principle, denote any sampling rate less than or equal to 1-min. We will use both 1-min and 64-min sampling. With 1-min sampling, results correspond to "continuous" time series. For example, if the t j are taken as minute sample times, then the corresponding spot samples are the same as the source 1-min data, B s (t j ) = B(t j ) B(t). With 64-min sampling, results correspond to discrete "hourly" samples of the time series. Our choice, in Eqs. (43) and (44), to sum over 65 and 129 min of data, instead of 64 and 128, is motivated by the need to keep these average-samples symmetric in time and comparable to the other average-sample types. In practice, the summations in Eqs. (45) and (46) need to be taken over sufficiently large number of data to ensure convergence. However, it is easier to construct (46) by band-pass filtering in the frequency domain. In principle, t b can be anything we want it to be, but it is usually set equal to twice the sampling interval, t b = 2t s , which is what we do.

Frequency spectra of historical CLF data
The power spectral density for a time series B(t) is defined by a "periodogram" (Kanasewich, 1981, Sect. 7.1), and the display of P B (f ) in a graph as a function of frequency f is a "spectrum". The discrete power spectral density, for each unit of frequency, is given by In Fig. 1a we show power spectra P H (f ) for two 18-year durations of CLF horizontal-intensity hourly values: 1954.0-1972.0 (hourly spot measurements) and 1972.0-1990.0 (1-h averages). Regular solar-quiet variation is seen as prominent diurnal peaks with frequencies of 1/d, 2/d, etc. (e.g. Olsen, 2007). Magnetic storms and disturbance are seen as a broad wash of energy across the entire range of frequencies, with a gradual "red-spectrum" decrease in energy with increasing frequency. Important for this study are the differences in the spectra between the older pre-1972.0 data, when reported hourly values were spot measurements, and the newer post-1972.0 data, when reported hourly values were 1-h average-samples. The spectral power for spot measurements is higher at high frequencies than that for 1-h average-samples. Since spectral differences might simply be reflective of different levels of activity, these observations, on their own, are not sufficient to prove that different average-sample types (spot and average) suffer from different amounts of amplitude distortion and aliasing. But what we see in Fig. 1a is enough to motivate further investigation. We will return, at the end of Sect. 5.3, to discuss Fig. 1b.

Example of average-sampling during a storm
Using 1-min CLF-H source data, 1991.0-2009.0, and formulas (42) and (43), we generated synthetic 'spot' samples, 'continuous' running averages, and discrete 'hourly' average-samples. In Fig. 2 (22), and the discrete aliases, S s (t) Eq. (27), show Gibbs ringing. This tends to be correlated with spot samples that happen to fall on the extremes of the 1-min variation. Otherwise, before and after periods of rapid variation, ringing tends to occur in between spot values. The later is a result of the fact that for t s = 1 hr the sinc function has zero crossings at integer hours, a point we will examine again in Sec. 5.8. In Fig. 2(b) we see that continuous, running averaging gives a smooth representation of the original time series, but the extreme amplitudes of storm-time variation are smoothed out. We have described this as amplitude distortion, A 1 (t) Eq. (8). Discrete sampling of the averaged time series has a continuous signal, H 1s (t) Eq. (32), with less Gibbs ringing than that for spot sampling, and a low-frequency residual, L 1s (t) Eq. (35), that is, on average, less than aliasing of spot sampling.

Frequency spectra of different average-samples
Using, again, 1-min CLF-H source data, 1991.0-2009.0, we used formulas (42) to (46) to generate a variety of averagesample types. In Fig. 3(a) we see that the hourly spot samples have a spectrum P H s (f ) that is generally like 'continuous variation' that is approximated by 1-min data P H (f ). However, at high frequencies, near the Nyquist frequency (∼ 0.5 cycles/hr), the energy of the hourly spot spectrum is higher than that for the 1-min data. This lifting of the high-frequency end of the hourly spectrum is entirely due to aliasing,S s (f ) Eq. (26). In Fig. 3(b) we show the spectrum of aliasing P S s (f ), which can be recognized as the reflection of the 1-min spectrum through the Nyquist frequency. Because the 1-min spectrum is 'red', reflection through Nyquist results in an aliasing spectrum that is 'blue', and for this reason, aliasing is most easily seen in Fig. 3(a) for frequencies immediately below Nyquist.
More interesting are the spectra for 1-hr average-samples, Fig. 3(c). The spectrum P H 1 (f ) of continuous running averages is depressed at high frequencies compared to the 1-min spectrum P H (f ) by an amount equal to the square of the amplitude-distorting, frequency-response factor, sinc 2 (f ; f 1 ) Eq. (7). While the spectrum of discretely-sampled 1-hr average values P H 1s (f ) is slightly lower than that of the 1-min data, it is also slightly higher than the spectrum for continuous running averages. Again, this difference is due to aliasing,S 1s (f ) Eq. (37). In Fig. 3(d) we show the low-frequency residual spectrum, P L 1s (f ) Eq. (35), and the aliasing spectrum, P S 1s (f ) Eq. (37). Energy in the residual spectra in- average-samples. In Fig. 2 (27), show Gibbs ringing. This tends to be correlated with spot samples that happen to fall on the extremes of the 1-min variation. Otherwise, before and after periods of rapid variation, ringing tends to occur in between spot values. The later is a result of the fact that for t s = 1 h the sinc function has zero crossings at integer hours, a point we will examine again in Sect. 5.8. In Fig. 2b we see that continuous, running averaging gives a smooth representation of the original time series, but the extreme amplitudes of storm-time variation are smoothed out. We have described this as amplitude distortion, A 1 (t) Eq. (8). Discrete sampling of the averaged time series has a continuous signal, H 1s (t) Eq. (32), with less Gibbs ringing than that for spot sampling, and a low-frequency residual, L 1s (t) Eq. (35), that is, on average, less than aliasing of spot sampling.

Frequency spectra of different average-samples
Using, again, 1-min CLF-H source data, 1991.0-2009.0, we used formulas (42) to (46) to generate a variety of averagesample types. In Fig. 3a we see that the hourly spot samples have a spectrum P H s (f ) that is generally like "continuous variation" that is approximated by 1-min data P H (f ). However, at high frequencies, near the Nyquist frequency (∼ 0.5 cycles/h), the energy of the hourly spot spectrum is higher than that for the 1-min data. This lifting of the highfrequency end of the hourly spectrum is entirely due to aliasing,S s (f ) Eq. (26). In Fig. 3b we show the spectrum of aliasing P S s (f ), which can be recognized as the reflection of the 1-min spectrum through the Nyquist frequency. Because the 1-min spectrum is "red", reflection through Nyquist results in an aliasing spectrum that is "blue", and for this reason, aliasing is most easily seen in Fig. 3a for frequencies immediately below Nyquist.
More interesting are the spectra for 1-h average-samples, Fig. 3c. The spectrum P H 1 (f ) of continuous running averages is depressed at high frequencies compared to the 1-min spectrum P H (f ) by an amount equal to the square of the amplitude-distorting, frequency-response factor, sinc 2 (f ;f 1 ) Eq. (7). While the spectrum of discretelysampled 1-h average values P H 1s (f ) is slightly lower than that of the 1-min data, it is also slightly higher than the spectrum for continuous running averages. Again, this difference is due to aliasing,S 1s (f ) Eq. (37). In Fig. 3d we show the lowfrequency residual spectrum, P L 1s (f ) Eq. (35), and the aliasing spectrum, P S 1s (f ) Eq. (37). Energy in the residual spectra increases with frequency ("blue"), and the integrated energy up to the Nyquist frequency is evidently less than that seen for instantaneous, spot samples Fig. 3b, an observation consistent with observations made of the time series in Fig. 2. Furthermore, while aliasing accounts for a large portion of the low-frequency residuals, it does not account for the relatively prominent residual differences seen in periodic diurnal Ann. Geophys., 28,[2079][2080][2081][2082][2083][2084][2085][2086][2087][2088][2089][2090][2091][2092][2093][2094][2095][2096]2010 www.ann-geophys.net/28/2079/2010/ creases with frequency ('blue'), and the integrated energy up to the Nyquist frequency is evidently less than that seen for instantaneous, spot samples Fig. 3(b), an observation consistent with observations made of the time series in Fig. 2. Furthermore, while aliasing accounts for a large portion of the low-frequency residuals, it does not account for the relatively prominent residual differences seen in periodic diurnal terms -those differences are due to amplitude distortion. Similar conclusions can be drawn from the spectra for Gaus-sian average-samples Fig. 3(g,h).
For 2-hr running averages, we see in Fig. 3(e) that the energy in the spectrum P H 2 (f ) is substantially depressed for frequencies near Nyquist, and, indeed, the spectrum for hourly 2-hr average-samples P H 2s (f ) is also substantially depressed at high frequencies. In Fig. 3(f) we see that aliasing spectrum P S 2 (f ) is lower than that for 1-hr Fig. 3(d) or Gaussian Fig. 3(h) average-samples, but the residual spectrum P L 2 (f ) is higher. Brick-wall results, P H σ (f ) Fig. 3(i,j), terms -those differences are due to amplitude distortion. Similar conclusions can be drawn from the spectra for Gaussian average-samples Fig. 3g, h. For 2-h running averages, we see in Fig. 3e that the energy in the spectrum P H 2 (f ) is substantially depressed for frequencies near Nyquist, and, indeed, the spectrum for hourly 2-h average-samples P H 2s (f ) is also substantially depressed at high frequencies. In Fig. 3f we see that aliasing spectrum P S 2 (f ) is lower than that for 1-h Fig. 3d or Gaussian Fig. 3h average-samples, but the residual spectrum P L 2 (f ) is higher.
Brick-wall results, P H σ (f ) Fig. 3i, j, are a special case; this type of average-sample has perfect frequency response below Nyquist, and therefore no low-frequency residuals at all. The lesson we can take away from these comparisons is that there is a trade-off between amplitude distortion and aliasing.
Returning, now, to the power spectra of the historical CLF data, in Fig. 1b we show the spectra for hourly spot samples P H s (f ) and 1-h average-samples P H s (f ). In each case, these have been constructed from 1-min data covering an 18year (1991.0-2009.0) period of time that is equal in duration Gaussian, and (i,j) brick-wall average-samples. Shown are spectral densities of the source 1-min data P H (f ) (red), running averages P H a (f ) (gray), discrete hourly average-samples P H as (f ) (black), low-frequency residuals P L as (f ) (blue), and aliasing P S as (f ) (brown).  to the two periods of time shown in Fig. 1a for the historical data (spot samples 1954.0-1972.0, 1-h average-samples 1972.0-1990.0). From the considerable similarity between the spectra seen in Figs. 1a, b and 3a, b, we conclude that the spectral differences in the historical CLF, seen in Fig. 1a, are almost entirely due to different averaging and sampling methods.

Statistical moments of the average-samples
As a statistical summary of magnetic-field variability recorded by the various hourly average-sample types, we calculate average-absolute deviations and standard deviations each defined here for zero-mean data; recall from Sect. 4 that we have subtracted a slowly-varying trendline. By Parseval's theorem, the variance of the time series equals the total power integrated over all frequencies, (Lee, 1960, p. 11;Bracewell, 1978, p. 112). Therefore, results for statistical variance, Eq. (50), can be interpreted in terms of the integral of the spectral power in the frequencydomain, Eq. (48).
In Table 2 we list statistical moments for horizontal intensity H , declination D and vertical intensity Z, from high (BRW), medium (CLF), and low-latitude (HUA) observatories. Hourly spot measurements are an unbiased representation, in a statistical sense, of the amplitude range of magnetic-field variation. It is, therefore, useful to compare the moments of spot-sample measurements with those of the other average-samples types. Without exception, brick-wall (2-h) average-samples have δ and σ values closest to (furthest from) those of spot samples. Still, it is noteworthy that results for 1-h average-samples are relatively similar to those of both the Gaussian and brick-wall average-samples.

Low-frequency residual moments
In Table 3 we list the statistical moments of the lowfrequency residuals, L as (t) Eq. (35). Generally speaking, the smaller the size of this residual, the better the representation of continuous magnetic-field variation for frequencies below Nyquist. Low-frequency residuals for spot samples are the largest, something that is, once again, entirely due to aliasing. Those for 2-h average-samples are the next largest, something that is primarily due to amplitude distortion. For 1-h and Gaussian average-samples, low-frequency residual differences are of similar size. There are no residuals for the brick-wall average-samples. For all average-sample types, even spot samples, the Z average absolute and standard deviations for medium latitudes (CLF) and low latitudes (HUA) are less than or equal to the 1.0 nT resolution used in historical observatory yearbooks.

Monthly and annual means
Monthly and annual observatory means are used for secular variation studies, and so it is useful to evaluate the effects of using different types of hourly average-samples in their construction. For simplicity, we define "monthly" means as the average of geomagnetic-vector components over t m = 685 "hours", each being 64 min in length, with denoting any chosen hourly average-sample type from Eq. (42) to Eq. (46). The difference between a monthly mean of hourly average-samples and a monthly mean of 1-min data is variance of amplitude distortion A σs (t) and aliasing S σs (t), as function of t σ . Amplitude distortion (aliasing) increases (decreases) with increasing t σ , but there is an inflection point at t σ = 55 min where the residuals are minimized. This method of selecting an optimal averaging width is a simple example of what is sometimes called 'filter design'. It is tempting to try to relate the optimized Gaussian filter to a boxcar average of a certain averaging width. While we hesitate to read too much into such comparisons, we have already observed from Table 3 that some of the properties of 1hr average-samples are similar to those of the optimum Gaussian filter. The small statistical differences between these two average-sample types are probably not very significant for most uses of observatory hourly values. Of course, the redeeming quality of 1-hr average-samples is that they are easy to calculate, even easier than Gaussian average-samples.

Autocorrelation and avoiding the Gibbs effect
While it is a standard practice to use power spectra to check for periodic signals and to display Fourier amplitudes in the frequency domain, in the time domain the corresponding quantity is autocorrelation. For a real function, such as an observatory magnetic-field time series, autocorrelation is defined by the integral (Press et al., 1992, 'correl'), where τ represents a relative time lag. By the Wiener-Khinchine or 'auto-correlation' theorem (Lee, 1960, p. 11;Bracewell, 1978, p. 115), an auto- In Table 4 we list the statistical moments of m (t). For low and medium latitudes, average differences are less than the 1.0 nT resolution in the monthly-mean database of Chulliat and Telali (2007). As might be expected, differences are largest for spot samples from high-latitude observatories, but even these differences are small compared to the 5.0 nT standard for absolute accuracy established by Intermagnet for modern 1-min data. Corresponding differences for annual means will be even smaller, by about a factor of 1/ √ 12. The slightly larger average differences for Gaussian averagesamples are mostly due to end-point differences, with the tails of each hourly Gaussian filter extending outside of each monthly boxcar averaging window. In analyzing secular variation, it is probably acceptable to use any type of hourly average-sample to construct monthly and annual means.

Optimum Gaussian filter
Our choice of a Gaussian filter with t σ = 55 min minimizes the variance of the low-frequency residuals, L σ s (f ) Eqs. (35) and (50). In Fig. 4 we show this quantity, along with the variance of amplitude distortion A σ s (t) and aliasing S σ s (t), as function of t σ . Amplitude distortion (aliasing) increases (decreases) with increasing t σ , but there is an inflection point at t σ = 55 min where the residuals are minimized. This method of selecting an optimal averaging width is a simple example of what is sometimes called "filter design".
It is tempting to try to relate the optimized Gaussian filter to a boxcar average of a certain averaging width. While we hesitate to read too much into such comparisons, we have already observed from Table 3 that some of the properties of 1-h average-samples are similar to those of the optimum Gaussian filter. The small statistical differences between these two average-sample types are probably not very significant for most uses of observatory hourly values. Of course, the redeeming quality of 1-h average-samples is that they are easy to calculate, even easier than Gaussian average-samples.

Autocorrelation and avoiding the Gibbs effect
While it is a standard practice to use power spectra to check for periodic signals and to display Fourier amplitudes in the frequency domain, in the time domain the corresponding quantity is autocorrelation. For a real function, such as an observatory magnetic-field time series, autocorrelation is defined by the integral (Press et al., 1992, "correl"), where τ represents a relative time lag. By the Wiener-Khinchine or "auto-correlation" theorem (Lee, 1960, p. 11;Bracewell, 1978, p. 115), an autocorrelation is related to a power spectrum by Fourier transformation, Despite this duality, some properties of a time series are more easily seen in power spectra, while others are more easily seen in autocorrelation.
In Fig. 5 we show autocorrelations for the source 1min data C H (τ ), continuous running averages C H a (τ ), and average-samples C H as (τ ) for CLF-H 1991.0-2009.0. Prominent solar-quiet diurnal peaks are seen with time lags of 1d, 2d, etc.; these correspond to the diurnal peaks in frequency of 1/d seen in the power spectra of Fig. 3. For very short time lags, less than the averaging duration, autocorrelations C H a (τ ) are flat. These results are all to be expected. What is perhaps more interesting are the autocorrelations for low-frequency residuals, C L a (τ ) Eq. (35). For spot sampling, residuals are entirely due to aliasing, and since aliasing residuals tend to be larger than those for other averagesample types, the autocorrelation of spot-sampling residuals has a larger zero-time-lag base value. For all average-sample types, other than the brick-wall, there is significant Gibbseffect ringing out to time lags of several hours. Close inspection of Fig. 5 shows that the zero crossings of C L a (τ ) occur, very nearly, at integer hours (64-min hours). This is consistent with the observation we made in Sect. 5.2 that hourly average-samples avoid most Gibbs ringing by straddling the zero crossings of the sinc function.

Correlation and proportionality of average-samples
As we have remarked, brick-wall average-samples have no low-frequency residuals, while the other average-sample www.ann-geophys.net/28/2079/2010/ Ann. Geophys., 28, 2079-2096, 2010   correlation is related to a power spectrum by Fourier transformation, Despite this duality, some properties of a time series are more easily seen in power spectra, while others are more easily seen in autocorrelation.
In Fig. 5 we show autocorrelations for the source 1min data C H (τ ), continuous running averages C H a (τ ), and average-samples C H as (τ ) for CLF-H 1991.0-2009.0. Prominent solar-quiet diurnal peaks are seen with time lags of 1d, 2d, etc.; these correspond to the diurnal peaks in frequency of 1/d seen in the power spectra of Fig. 3. For very short time lags, less than the averaging duration, autocorrelations C H a (τ ) are flat. These results are all to be expected. What is perhaps more interesting are the autocorrelations for low-frequency residuals, C L a (τ ) Eq. (35). For spot sampling, residuals are entirely due to aliasing, and since aliasing residuals tend to be larger than those for other averagesample types, the autocorrelation of spot-sampling residuals has a larger zero-time-lag base value. For all average-sample types, other than the brick-wall, there is significant Gibbseffect ringing out to time lags of several hours. Close inspection of Fig. 5 shows that the zero crossings of C L a (τ ) occur, very nearly, at integer hours (64-min hours). This is consistent with the observation we made in Sec. 5.2 that hourly average-samples avoid most Gibbs ringing by straddling the zero crossings of the sinc function.

Correlation and proportionality of average-samples
As we have remarked, brick-wall average-samples have no low-frequency residuals, while the other average-sample types have some combination of amplitude distortion and aliasing. It is useful, therefore, to compare linear correlations ρ between brick-wall and other average-sample types, (Press et al., 1992, 'pearsn'), defined here for zero-mean data; recall from Sec. 4 that we have subtracted a slowlyvarying trendline. A correlation close to unity means that the relative amplitude, phase, and frequency of continuous magnetic-field variation below Nyquist are being accurately recorded. In Fig. 6 we show scatter plots for brick-wall and 1-hr average-samples. In Table 5 we list correlations for each magnetic-field component and for each observatory. In general, spot and 2-hr average-samples have the lowest correlations with brick-wall average-samples, while 1-hr and Gaussian average-samples are very highly correlated with brickwall average-samples. Since spot measurements are a statistically unbiased sampling of the amplitude range of magnetic-field variation, it is useful to compare their proportionalites α with the other average-samples, where each α are estimated with a least-squares algorithm (Press et al., 1992, 'fitexy'). A proportionality close to unity means that the average-sample type is accurately recording the absolute amplitude range of magnetic-field variation. In Fig. 7 we show scatter plots for spot and 1-hr average-samples. In Table 5 we list proportionalities for each magnetic-field component and for each observatory. Without exception, brick-wall (2-hr) average-samples are the most (least) directly proportional to spot samples. Results for 1-hr average-samples are relatively similar to those of both Gaussian and brick-wall average-samples.   types have some combination of amplitude distortion and aliasing. It is useful, therefore, to compare linear correlations ρ between brick-wall and other average-sample types, (Press et al., 1992, "pearsn"), defined here for zero-mean data; recall from Sect. 4 that we have subtracted a slowlyvarying trendline. A correlation close to unity means that the relative amplitude, phase, and frequency of continuous magnetic-field variation below Nyquist are being accurately recorded. In Fig. 6 we show scatter plots for brick-wall and 1-h average-samples. In Table 5 we list correlations for each magnetic-field component and for each observatory. In general, spot and 2-h average-samples have the lowest correla-tions with brick-wall average-samples, while 1-h and Gaussian average-samples are very highly correlated with brickwall average-samples. Since spot measurements are a statistically unbiased sampling of the amplitude range of magnetic-field variation, it is useful to compare their proportionalites α with the other average-samples, where each α are estimated with a least-squares algorithm (Press et al., 1992, "fitexy"). A proportionality close to unity means that the average-sample type is accurately recording the absolute amplitude range of magnetic-field variation. In Fig. 7 we show scatter plots for spot and 1-h average-samples. In Table 5 we list proportionalities for each www.ann-geophys.net/28/2079/2010/ Ann. Geophys., 28, 2079-2096, 2010

Conclusions
In analyses complementary to this one (e.g. Love, 2009;Marsal and Curto, 2009), the accuracy of hourly averagesamples constructed from digital 1-min data having missing values has been investigated, and corresponding errors were estimated. Here we have considered a more fundamental issue, the effects of averaging and sampling in the construction of observatory hourly values. Of the averagesample types most commonly used, spot samples are a statistically-unbiased representation of the amplitude range of geomagnetic-field variation, but as a representation of continuous variation over time, spot values are heavily contaminated by aliasing. On the other hand, 1-hr average-samples are a statistically-biased representation of geomagnetic-field values, but as a representation of continuous variation be- magnetic-field component and for each observatory. Without exception, brick-wall (2-h) average-samples are the most (least) directly proportional to spot samples. Results for 1-h average-samples are relatively similar to those of both Gaussian and brick-wall average-samples.

ESK hourly values at the WDCs
In Sect. 2 we mentioned that the pre-1932.0 hourly WDC-CE ESK data are 2-h average-samples, with each datum formed by averaging two adjacent 1-h average-samples from the original observatory yearbooks. In Fig. 8 we show frequency-domain power spectra of historical ESK-H data obtained from WDC-CE, separately for the years 1917. 0-1932.0 and 1932.0-1947.0. As a result of 2-h averaging, the frequency content of the data for 1917.0-1932.0 has been dramatically altered. Since many researchers assume that data obtained from a WDC are those that the observatories originally reported, it would be desirable to return the WDC-CE holdings of the ESK data to their original form, possibly by using the digital data held at Kyoto. Otherwise, it would be useful for the data to be flagged as having been changed. Note added after journal review: we are happy to have learned that the pre-1932.0 hourly WDC-CE ESK data have been restored to their original yearbook values (S. Macmillan, personal communication, 2010).

Conclusions
In analyses complementary to this one (e.g. Love, 2009;Marsal and Curto, 2009), the accuracy of hourly averagesamples constructed from digital 1-min data having missing values has been investigated, and corresponding errors were estimated. Here we have considered a more fundamental issue, the effects of averaging and sampling in the construction of observatory hourly values. Of the averagesample types most commonly used, spot samples are a statistically-unbiased representation of the amplitude range of geomagnetic-field variation, but as a representation of continuous variation over time, spot values are heavily contaminated by aliasing. On the other hand, 1-h average-samples are a statistically-biased representation of geomagnetic-field values, but as a representation of continuous variation below Ann. Geophys., 28,[2079][2080][2081][2082][2083][2084][2085][2086][2087][2088][2089][2090][2091][2092][2093][2094][2095][2096]2010 www.ann-geophys.net/28/2079/2010/ Nyquist, 1-h average-samples from medium and low-latitude observatories are only slightly contaminated by amplitude distortion and aliasing -their average low-frequency residuals are less than the 5.0 nT absolute accuracy level established by Intermagnet as a modern operational standard for observatories. High-latitude 1-h average-samples have more amplitude distortion and aliasing, but the natural magnetic "signal" is also larger at high latitudes. 1-h average-samples constructed using an optimal Gaussian filter are not significantly more accurate than simple 1-h average values. Therefore, we recommend the continuation of the standard practice by observatories and World Data Centers for reporting simple 1-h-average hourly values. Of course, this does not preclude the production of other data-based products, including other types of hourly values, but maintaining continuity with the historical 1-h-average time series should be recognized as an important priority.