Journal cover Journal topic
Annales Geophysicae An interactive open-access journal of the European Geosciences Union
Journal topic
Ann. Geophys., 36, 945–952, 2018
https://doi.org/10.5194/angeo-36-945-2018
Ann. Geophys., 36, 945–952, 2018
https://doi.org/10.5194/angeo-36-945-2018

ANGEO Communicates 02 Jul 2018

ANGEO Communicates | 02 Jul 2018

# Transfer entropy and cumulant-based cost as measures of nonlinear causal relationships in space plasmas: applications to Dst

Transfer entropy and cumulant-based cost as measures of nonlinear causal relationships in space plasmas: applications to Dst
Jay R. Johnson1, Simon Wing2, and Enrico Camporeale3 Jay R. Johnson et al.
• 1Andrews University, Berrien Springs, MI, USA
• 2The Johns Hopkins University, Applied Physics Laboratory, Laurel, MD, USA
• 3Center for Mathematics and Computer Science (CWI), Amsterdam, the Netherlands

Correspondence: Jay R. Johnson (jrj@andrews.edu)

Abstract

It is well known that the magnetospheric response to the solar wind is nonlinear. Information theoretical tools such as mutual information, transfer entropy, and cumulant-based analysis are able to characterize the nonlinearities in the system. Using cumulant-based cost, we show that nonlinear significance of Dst peaks at 3–12 h lags that can be attributed to VBs, which also exhibits similar behavior. However, the nonlinear significance that peaks at lags 25, 50, and 90 h can be attributed to internal dynamics, which may be related to the relaxation of the ring current. These peaks are absent in the linear and nonlinear self-significance of VBs. Our analysis with mutual information and transfer entropy shows that both methods can establish that there are strong correlations and transfer of information from Vsw to Dst at a timescale that is consistent with that obtained from the cumulant-based analysis. However, mutual information also shows that there is a strong correlation in the backward direction, from Dst to Vsw, which is counterintuitive. In contrast, transfer entropy shows that there is no or little transfer of information from Dst to Vsw, as expected because it is the solar wind that drives the magnetosphere, not the other way around. Our case study demonstrates that these information theoretical tools are quite useful for space physics studies because these tools can uncover nonlinear dynamics that cannot be seen with the traditional analyses and models that assume linear relationships.

1 Introduction

One of the most practically important concepts in dynamical systems is the notion of causality. It is particularly useful to organize observational datasets according to causal relationships in order to identify variables that drive the dynamics. Understanding causal dependencies can also help to simplify descriptions of highly complex physical processes because it constrains the coupling functions between the dynamical variables. Analysis of those coupling functions can lead to simplification of the underlying physical processes that are most important for driving the system. It is particularly useful from a practical standpoint to understand causal dependencies in systems involving natural hazards because monitoring of causal variables is closely linked with warning.

A common method to establish causal dependencies in a data stream of two variables, e.g., [a(t)] and [b(t)], is to apply linear correlation studies such as , which showed the relationship between the downward Poynting flux and ion outflows. Causal relationships are typically identified by considering a time-shifted correlation function

$\begin{array}{}\text{(1)}& {\mathit{\lambda }}_{ab}\left(\mathit{\tau }\right)\mathit{\triangleq }\frac{〈a\left(t\right)b\left(t+\mathit{\tau }\right)〉-〈a〉〈b〉}{\sqrt{〈{a}^{\mathrm{2}}〉-〈a{〉}^{\mathrm{2}}}\sqrt{〈{b}^{\mathrm{2}}〉-〈b{〉}^{\mathrm{2}}}},\end{array}$

where 〈…〉 is an ensemble average obtained by drawing samples at a set of measurement times, $\mathit{\left\{}{t}_{\mathrm{0}},{t}_{\mathrm{1}},\mathrm{\dots },{t}_{N}\mathit{\right\}}$. For example, used such a method to identify relationships between solar wind variables and plasma sheet variables. The causal dependency that the plasma sheet responds to changes in the solar wind can be identified from the time-shift of the peak of the cross-correlation indicating a response time. From this type of analysis it can be found that the plasma sheet generally responds from the tail to the inner magnetosphere, consistent with the notion of earthward convection. Such analysis has been particularly useful to help understand plasma sheet transport.

However, the procedure of detecting causal relationships based on linear cross-correlation suffers from a number of limitations. First it should be noted that the statistical accuracy of the correlation function is limited by the resolution and length of the data stream. Second, the linear time series analysis ignores nonlinear correlations, which may be important for energy transfer in the magnetospheric system. For example, substorms are believed to involve storage and release of energy in the magnetotail, which is a highly nonlinear response. Similarly, magnetosphere–ionosphere coupling may also be highly nonlinear, involving the nonlinear development of accelerating potentials along auroral field lines and nonlinear current–voltage relationships. Third, the cross-correlation may not be a particularly clear measure when there are multiple peaks or if there is little or no asymmetry in the forward (i.e., λab(τ)) and backward directions (i.e., ${\mathit{\lambda }}_{ba}\left(\mathit{\tau }\right)={\mathit{\lambda }}_{ab}\left(-\mathit{\tau }\right)$). Finally, the cross-correlation does not provide any way to clearly distinguish between two variables that are passively correlated because of a common driver rather than causally related.

In the remainder of this paper, we will discuss other methods to identify causal relationships based on entropy-based discriminating statistics such as mutual information and transfer entropy. We will also discuss the cumulant-based method. We will illustrate the shortcomings and strengths of the various methods for studying causality with examples from nonlinear dynamics and space physics.

2 Linear vs. nonlinear dependency

It is well known that the magnetosphere responds to variation in the solar wind parameters , and it has been established that the magnetosphere has a significant linear response to the solar wind. However, it is also expected that the magnetosphere has a nonlinear response . The nonlinear response may be driven by internal dynamics rather than being driven externally . For example, the internal dynamics associated with loading and unloading of magnetic energy associated with storms and substorms is nonlinear (Johnson and Wing2014). Indeed, the data analysis of indicated that the dynamical response of the magnetosphere to solar wind input could not be entirely understood using linear prediction filters.

Suppose that we consider a set of variables a and b, which could be vectors of variables measured in time and we would like to measure their dependency. Instead of considering the covariance matrix or correlation function, we consider a more general measure of dependency between an input and output is obtained by considering whether

$\begin{array}{}\text{(2)}& P\left(\mathbit{a},\mathbit{b}\right)\stackrel{\mathrm{?}}{=}P\left(\mathbit{a}\right)P\left(\mathbit{b}\right),\end{array}$

where P(a,b) is the joint probability of input a and output b, while P(a) and P(b) are the probability of a and b respectively. If the relationship holds, then the variables a and b are independent. For all other cases, there is some measure of dependency. In the case where the system output is completely known given the input, $P\left(\mathbit{a},\mathbit{b}\right)=P\left(\mathbit{a}\right)$. The advantage of considering Eq. (2) is that it is possible to detect the presence of higher order nonlinear dependencies between the input and output even in the absence of linear dependencies .

## 2.1 Mutual information and cumulant-based cost

Mutual information and cumulant-based cost are two useful measures that quantify Eq. (2). Mutual information has the advantage that in the limit of Gaussian joint probability distributions, it may be simply related to the correlation coefficient Cab(τ) defined in Eq. (1) (Li1990). Cumulants have the advantage of good statistics for limited datasets and noisy systems . Moreover, for high-dimensional systems it is more efficient to compute moments of the data rather than try to construct the probability density function.

Correlation studies also only detect linear correlations, so if the feedback involves nonlinear processes (highly likely in this case) then their usefulness may be seriously limited. Alternatively, entropy-based measures such as mutual information and cumulants are useful for detecting linear as well as nonlinear correlations. The mutual information is constructed from the probability distribution function of the variables and may be computed using a quantization procedure where data are binned such that the samples [a(t)] are assigned discrete values $\stackrel{\mathrm{^}}{a}\in \mathit{\left\{}{a}_{\mathrm{1}},{a}_{\mathrm{2}},\mathrm{\dots },{a}_{n}\mathit{\right\}}$ of an alphabet 1 and [b(t)] is assigned discrete values $\stackrel{\mathrm{^}}{b}\in \mathit{\left\{}{b}_{\mathrm{1}},{b}_{\mathrm{2}},\mathrm{\dots },{b}_{m}\mathit{\right\}}$ of an alphabet 2. The ad hoc time-shifted mutual entropy

$\begin{array}{ll}& {\mathcal{M}}_{ab}\left(\mathit{\tau }\right)\mathit{\triangleq }\\ \text{(3)}& & \phantom{\rule{1em}{0ex}}\sum _{\stackrel{\mathrm{^}}{a}\in {\mathrm{\aleph }}_{\mathrm{1}},\stackrel{\mathrm{^}}{b}\in {\mathrm{\aleph }}_{\mathrm{2}}}p\left(\stackrel{\mathrm{^}}{a}\left(t+\mathit{\tau }\right),\stackrel{\mathrm{^}}{b}\left(t\right)\right)\mathrm{log}\left(\frac{p\left(\stackrel{\mathrm{^}}{a}\left(t+\mathit{\tau }\right),\stackrel{\mathrm{^}}{b}\left(t\right)\right)}{p\left(\stackrel{\mathrm{^}}{a}\right)p\left(\stackrel{\mathrm{^}}{b}\right)}\right)\end{array}$

has been used as an indicator of causality, but suffers from the same problems as the time-shifted cross-correlation when it has multiple peaks and long-range correlations.

Similarly, examination of time-shifted cumulants could be used as an indicator of causality in a nonlinear system. In this case, we can define a discriminating statistic

$\begin{array}{}\text{(4)}& {D}^{C}=\sum _{q=\mathrm{1}}^{\mathrm{\infty }}\sum _{{i}_{\mathrm{1}},\mathrm{\dots },{i}_{q}\in {\mathrm{\Pi }}_{q}}{K}_{\mathrm{1}{i}_{\mathrm{2}}\mathrm{\dots }{i}_{q}}^{\mathrm{2}},\end{array}$

where

$\begin{array}{}\text{(5)}& \begin{array}{lcl}{K}_{i}& =& {C}_{i}=〈{z}_{i}〉\\ {K}_{ij}& =& {C}_{ij}-{C}_{i}{C}_{j}=〈{z}_{i}{z}_{j}〉-〈{z}_{i}〉〈{z}_{j}〉\\ {K}_{ijk}& =& {C}_{ijk}-{C}_{ij}{C}_{k}-{C}_{jk}{C}_{i}-{C}_{ik}{C}_{j}+\mathrm{2}{C}_{i}{C}_{j}{C}_{k}\\ {K}_{ijkl}& =& {C}_{ijkl}-{C}_{ijk}{C}_{l}-{C}_{ijl}{C}_{k}-{C}_{ilk}{C}_{j}-{C}_{ljk}{C}_{i}\\ & & -{C}_{ij}{C}_{kl}-{C}_{il}{C}_{kj}-{C}_{ik}{C}_{jl}+\mathrm{2}\left({C}_{ij}{C}_{k}{C}_{l}\\ & & +{C}_{ik}{C}_{j}{C}_{l}+{C}_{il}{C}_{j}{C}_{k}+{C}_{jk}{C}_{i}{C}_{l}+{C}_{jl}{C}_{i}{C}_{k}\\ & & +{C}_{kl}{C}_{i}{C}_{j}\right)-\mathrm{6}{C}_{i}{C}_{j}{C}_{k}{C}_{l}\end{array}\end{array}$

are the cumulants

$\begin{array}{}\text{(6)}& {C}_{i\mathrm{\dots }j}=\int \mathrm{d}\mathbit{z}P\left(\mathbit{z}\right){z}_{i}\mathrm{\dots }{z}_{j}\equiv 〈{z}_{i}\mathrm{\dots }{z}_{j}〉\end{array}$

of the joint probability distribution for variables ${z}_{\mathrm{1}},\mathrm{\dots },{z}_{j}$.

With only two variables, a and b, defined above, we can consider the cost function

$\begin{array}{}\text{(7)}& {D}_{a,b}^{C}\left(\mathit{\tau }\right)={D}^{C}\left(a\left(t\right),b\left(t+\mathit{\tau }\right)\right).\end{array}$

The presence of nonlinear dependence has been identified by comparing the cumulant cost for a time series with the cumulant-based cost of surrogate time series, which are constructed to have the same linear correlations as in . Significance measures the difference in the discriminating statistic from the mean of the discriminating statistic of the surrogates in terms of the spread of the surrogates, σ.

In Sect. 3, we will show an application of cumulant-based analysis to the disturbance storm time index (Dst). In principle, the cross-correlation, mutual information, and cumulant-based cost should be independent of the selection of measurement points if the system is stationary; therefore, time stationarity can be examined by comparing these discriminating statistics for groups of measurements drawn from different windows of time as in and .

## 2.2 Transfer entropy

Another method for determining causality is the one-sided transfer entropy , which is based upon the conditional mutual information

$\begin{array}{ll}& {\mathcal{M}}_{C}\left(x,y|z\right)\mathit{\triangleq }\\ \text{(8)}& & \phantom{\rule{1em}{0ex}}\sum _{x\in {\mathrm{\aleph }}_{\mathrm{1}}}\sum _{y\in {\mathrm{\aleph }}_{\mathrm{2}}}\sum _{z\in {\mathrm{\aleph }}_{\mathrm{3}}}p\left(x,y,z\right)\mathrm{log}\left(\frac{p\left(x,y,z\right)p\left(z\right)}{p\left(x,z\right)p\left(y,z\right)}\right).\end{array}$

The conditional mutual information measures the dependence of two variables, x and y, given a conditioner variable, z. If either x or y are dependent on z, the mutual information between x and y is reduced, and this reduction of information provides a method to eliminate coincidental dependence, or conversely to identify causal dependence.

Transfer entropy considers the conditional mutual information between two variables using the past history of one of the variables as the conditioner.

$\begin{array}{ll}& {\mathcal{T}}_{a\to b}\left(\mathit{\tau }\right)=\sum _{\stackrel{\mathrm{^}}{a}\in {\mathrm{\aleph }}_{\mathrm{1}}}\sum _{{\stackrel{\mathrm{^}}{a}}^{\left(k\right)}\in {\mathrm{\aleph }}_{\mathrm{1}}^{\left(k\right)}}\sum _{\stackrel{\mathrm{^}}{b}\in {\mathrm{\aleph }}_{\mathrm{2}}}p\left(\stackrel{\mathrm{^}}{a}\left(t+\mathit{\tau }\right),{\stackrel{\mathrm{^}}{a}}^{\left(k\right)}\left(t\right),\stackrel{\mathrm{^}}{b}\left(t\right)\right)\\ \text{(9)}& & \phantom{\rule{1em}{0ex}}\mathrm{log}\left(\frac{p\left(\stackrel{\mathrm{^}}{a}\left(t+\mathit{\tau }\right)\mathrm{|}{\stackrel{\mathrm{^}}{a}}^{\left(k\right)}\left(t\right),\stackrel{\mathrm{^}}{b}\left(t\right)\right)}{p\left(\stackrel{\mathrm{^}}{a}\left(t+\mathit{\tau }\right)\mathrm{|}{\stackrel{\mathrm{^}}{a}}^{\left(k\right)}\left(t\right)\right)}\right),\end{array}$

where ${\stackrel{\mathrm{^}}{a}}^{\left(k\right)}\left(t\right)=\left[\stackrel{\mathrm{^}}{a}\left(t\right),\stackrel{\mathrm{^}}{a}\left(t-\mathrm{\Delta }\right),\mathrm{\dots },\stackrel{\mathrm{^}}{a}\left(t-\left(k-\mathrm{1}\right)\mathrm{\Delta }\right)\right]$. The standard definition of transfer entropy takes k=1 (no lag), but keeping a higher embedding dimension could in principle provide a more precise measure (for example, if a has periodicity, a dimension of 2 may provide better prediction of future values of a from its past time series and therefore lower the transfer entropy). Transfer entropy as a discriminating statistic has the following advantages. First, in the absence of information flow from a to b (i.e., a(t+τ) has no additional dependence from b(t) beyond what is known from the past history of a(k)(t)) so that $p\left(\stackrel{\mathrm{^}}{a}\left(t+\mathit{\tau }\right)\mathrm{|}{\stackrel{\mathrm{^}}{a}}^{\left(k\right)}\left(t\right),\stackrel{\mathrm{^}}{b}\left(t\right)\right)=p\left(\stackrel{\mathrm{^}}{a}\left(t+\mathit{\tau }\right)\mathrm{|}{\stackrel{\mathrm{^}}{a}}^{\left(k\right)}\left(t\right)\right)$ and the transfer entropy vanishes. The transfer entropy is also highly directional so that ${\mathcal{T}}_{a\to b}\ne {\mathcal{T}}_{b\to a}$. The advantage can be clearly seen for dynamical systems in which variables are forward differenced and the transfer entropy is clearly one-sided while mutual information and correlation functions can even be symmetric . This measure also accounts for static internal correlations, which can be used to determine whether two variables are driven by a common driver or whether the variable b is causally driving the variable a.

Both mutual information and transfer entropy require binning of data. As mentioned in , the number of bins (nb) needs to be chosen properly and there are some guidelines that can be followed. In general, we would like to maximize the amount of information. Having too few bins would lump too many points into the same bin, leading to loss of information. Conversely, having too many bins would leave many bins with 0 or a few number of points, which also would lead to loss of information. proposed that for a normal distribution, optimal ${n}_{\mathrm{b}}={\mathrm{log}}_{\mathrm{2}}\left(n\right)+\mathrm{1}$ and bin width w = range∕nb, where n is the number of points in the dataset and range is the maximum value minus the minimum value of the points. In practice, there is usually a range of nb that would work.

3 Application to space weather: Dst analysis

Dst (disturbance storm time index) is an hourly index that gives a measure of the strength of the symmetric ring current that, in turn, provides a measure of the dynamics of geomagnetic storms . Because of its global nature, Dst is often used as one of the several indices that represent the state of the magnetosphere. For example, used the cumulative square amplitude of the Dst time series as a proxy for energy dissipation rate in the magnetosphere and found that it fits a power law well with log-periodic oscillations, which was interpreted as evidence for discrete-scale invariance in the Dst dynamics.

When plasma sheet ions are injected into the Earth's inner magnetosphere, they drift westward around the Earth, forming the ring current. Studies have shown that the substorm occurrence rate increases with solar wind velocity (high speed streams) (Kissinger et al.2011; Newell et al.2016). An increase in the solar wind electric field, VBz, can increase the dawn–dusk electric field in the magnetotail, which in turn determines the number of plasma sheet particles that move to the inner magnetosphere (Friedel et al.2001). Studies have shown that the electric field, VBs (Vsw× southward IMF Bz) or VBz, has a strong effect on the ring current dynamics .

For the present study, we examine the relationships between solar wind velocity (Vsw) and VBs with Dst. We use Dst records in the period 1974–2001 obtained from Kyoto University World Data Center for Geomagnetism (http://swdcwww.kugi.kyoto-u.ac.jp/index.html, last access: 18 January 2018). The corresponding solar wind data are obtained from IMP-8, ACE, WIND, ISEE1, and ISEE3 observations. The ACE SWEPAM and MAG data and the WIND MAG data are obtained from CDAWeb (http://cdaweb.gsfc.nasa.gov/, last access: 18 January 2018). The WIND 3DP data are obtained from the 3DP team directly. The ISEE1 and ISEE3 data are obtained from UCLA (these datasets are also available at NASA NSSDC; http://nssdc.gsfc.nasa.gov/space/, last access: 18 January 2018). The IMP8 data come directly from the IMP teams. The solar wind is propagated with the minimum variance technique to GSM (X, Y, Z) = (17, 0, 0) RE to produce 1 min files, from which hourly averaged solar wind parameters are constructed.

## 3.1 Cumulant-based analysis

Section 2.1 presents the method of cumulant-based cost. Here, we show an application of cumulant-based cost to detect nonlinear dynamics in Dst. We consider the forward coupling between a solar wind variable such as VBs and Dst, which characterizes the ring current response to the solar wind driver. We therefore consider the nonlinear cross-correlations of the vector

$\begin{array}{}\text{(10)}& \mathbit{c}\left(t,\mathit{\tau }\right)=\mathit{\left\{}{\mathit{\text{VB}}}_{\mathrm{s}}\left(t\right),{D}_{\mathrm{st}}\left(t+\mathit{\tau }\right)\mathit{\right\}}=\mathit{\left\{}{z}_{\mathrm{1}},{z}_{\mathrm{2}}\mathit{\right\}}.\end{array}$

The generalization of cost is based on realizations of {z1,z2}. In this case, each variable is Gaussianized with unit variance to eliminate static nonlinearities (i.e., higher order self-correlations in VBs and Dst are eliminated so that the cost measures only cross-dependence between VBs and Dst). This procedure is explained in the next paragraph.

The distributions of Dst and VBs are generally non-Gaussian. As such, the raw distributions (e.g., distribution of values of Dst) may have nonzero higher order cumulants (e.g., they can have a skew and kurtosis). This property makes it more difficult to interpret whether the higher order cumulants in the time evolution arise from the overall shape of the distribution of data points or from the time-ordering of the data. To eliminate the inherent nonzero cumulants in the overall distribution of data, we construct a rank-ordered map from the original dataset to a proxy dataset of the same length drawn from a Gaussian distribution . The distribution of the proxy dataset ensures that all cumulants of the distribution beyond second order should in principle vanish. However, the time-ordering of the data can still lead to nonzero cumulants because the joint probability distribution of Dst(t+τ) and Dst(t) may be non-Gaussian even if the distribution of Dst is Gaussian. Moreover, it is simple to construct surrogate data from the Gaussianized data that share the same autocorrelation by using the same power spectrum but randomly shifting the phases of the Fourier coefficients. The surrogate data therefore have the same autocorrelation as the original data. Any deviation from the linear statistic is apparent from comparison with the surrogate data, and we interpret these deviations as evidence of nonlinear dependence because we have falsified the hypothesis that the data can be adequately described by linear statistics. This method has been successfully employed in , in which the Kp record was analyzed with mutual information and cumulants.

In Fig. 1 we plot the significance obtained from the year 1999 as a function of time delay, τ. Significance extracted from $\mathit{\left\{}{\mathit{\text{VB}}}_{\mathrm{s}}\left(t\right),{D}_{\mathrm{st}}\left(t+\mathit{\tau }\right)\mathit{\right\}}$ and $\mathit{\left\{}{\mathit{\text{VB}}}_{\mathrm{s}}\left(t\right),{\mathit{\text{VB}}}_{\mathrm{s}}\left(t+\mathit{\tau }\right)\mathit{\right\}}$ for 1999 is plotted in panels (a) and (b), respectively. It should be noted that there is a strong linear response at around 3 h time delay. As shown in Fig. 1a, there is a clear nonlinear response with peaking around 3–10, 25, 50, and 90 h, lasting for approximately 1 week. In contrast, in Fig. 1b, the nonlinearity only has one broad peak around 3–12 h in the self-significance for VBs, suggesting that the nonlinear and linear peaks at τ=3–12 h in Fig. 1a may be associated with VBs. We will revisit the solar wind causal relationship with Dst using transfer entropy in Sect. 3.2.

The absence of the nonlinear peaks at τ= 25, 50, and 90 h in the self-significance for VBs (Fig. 1b) suggests that these nonlinearities in $\mathit{\left\{}{\mathit{\text{VB}}}_{\mathrm{s}}\left(t\right),{D}_{\mathrm{st}}\left(t+\mathit{\tau }\right)\mathit{\right\}}$ are related to internal magnetospheric dynamics. As the Dst index is thought to reflect storm activity, it is reasonable that nonlinear significance would decay on the order of 1 week as storms commonly last around that time. The strong nonlinear responses at τ= 25, 50, and 90 h are likely related to multiple modes of relaxation of the ring current following the commencement of storms. It should also be noted that other nonlinearities detected by even higher order cumulants may also be present; however, the calculation demonstrates the nonlinear nature of the underlying dynamics.

Figure 1Significance extracted from (a) $\mathit{\left\{}{\mathit{\text{VB}}}_{\mathrm{s}}\left(t\right),{D}_{\mathrm{st}}\left(t-\mathit{\tau }\right)\mathit{\right\}}$ and (b) $\mathit{\left\{}{\mathit{\text{VB}}}_{\mathrm{s}}\left(t\right),{\mathit{\text{VB}}}_{\mathrm{s}}\left(t-\mathit{\tau }\right)\mathit{\right\}}$ for 1999. It should be noted that there is a strong linear response at around 3 h time delay. There is a clear nonlinear response with a strong peak around 50 h lasting for approximately 1 week. The long-term nonlinear response is absent in the solar wind data, indicating that the long-term nonlinear correlations between VBs and Dst are the result of internal magnetospheric dynamics.

A common scenario for storm–ring current interaction is the following. A storm compresses the magnetosphere, intensifies the magnetic field in the magnetosphere, and injects energetic particles into the ring current region. The ring current intensifies during the main phase of the storm, which can last  6 h . Once the injection stops, the ring current begins to decay and the storm enters the recovery phase. Conservation of the magnetic moment implies that anisotropies develop in the ring current and plasma sheet. Anisotropy drives the ring current plasma unstable to ion cyclotron waves. The ion cyclotron waves scatter energetic ions into the loss cone so that they are lost from the ring current. Nonlinear interaction between waves and particles keeps the plasma near marginal stability with a steady loss of energetic particles due to wave–particle scattering. Other loss mechanisms include charge exchange, Coulomb scattering, and convection of ions to the front of the magnetopause. The ring current decay can have two stages . In the first stage, the ring current decays rapidly and the loss mechanisms can be attributed to convective outflow, pitch-angle scattering in the ring current, and O+ charge exchange (Hamilton et al.1988; Weygand and McPherron2006). The second stage may typically begin about 1 day from the commencement of the storm (see, for example, Fig. 7 of ). In the second stage, the decay rate is slower and is attributed mainly to H+ charge exchange and can take several days to deplete the ring current to the baseline level . We can speculate that the multiple nonlinear response lag times that are detected with the cumulant-based approach are likely the relaxation of the ring current due to the complex interplay of multiple loss processes.

Figure 2Comparison of mutual information and transfer entropy measures to determine causal driving of the magnetosphere as characterized by Dst. Note that causal driving appears to peak somewhat later (11 h) than indicated by mutual information (2 h), indicating that internal dynamics likely are very important initially. The backward transfer entropy is below the noise level for all values, indicating that Dst in no way influences the upstream solar wind velocity. Such a conclusion could not be inferred from the mutual information measure.

## 3.2 Transfer entropy

As mentioned in Sect. 2.2, transfer entropy gives a measure of how much information is transferred from one variable to another. We have applied transfer entropy and mutual information to the relationship between the Vsw and Dst for the period 1974–2001. The result is shown in Fig. 2. Note that the mutual information measure suggests strong correlations between prior values of Dst and Vsw. This finding suggests that Dst could be a driver of Vsw, which is counterintuitive. On the other hand, the transfer entropy clearly shows that this information transfer in the backward direction (DstVsw) does not rise above the noise level (the horizontal blue lines indicate mean and standard deviation of 100 surrogate datasets for which the data were randomly reordered.) This result is expected because it is the solar wind that drives the magnetosphere, not the other way around. The transfer of information from Vsw to Dst peaks at τ=8–11 h. The cumulant-based analysis in Sect. 3.1 shows that the response of Dst to VBs has a similar timescale. This timescale is consistent with the 4 to 15 h transport time for the solar wind to reach the midnight and noon regions of the geosynchronous orbit, respectively, from the dayside magnetopause . The analysis presented here illustrates the power of the transfer entropy for accessing causality.

4 Summary

We recently used mutual information, transfer entropy, and conditional mutual information to discover the solar wind drivers of the outer radiation belt electrons . Because Vsw anticorrelates with solar wind density (nsw), it is hard to isolate the effects of Vsw on radiation belt electrons, given nsw and vice versa. However, using conditional mutual information, we were able to determine the information transfer from nsw or any other solar wind parameters to radiation belt electrons, given Vsw (or any other solar wind parameters). We also showed that the triangle distribution in the radiation belt electron vs. solar wind velocity plot can be understood better when we consider that Vsw and nsw transfer information to radiation belt electrons with lags of 2 and 0 days (< 24 h), respectively. Also recently, we used transfer entropy to better understand the causal parameters in the solar cycle dynamo and their response lag times .

As a follow-up to , the present study demonstrates further how information theoretical tools can be useful for space physics and space weather studies. Cumulant-based analysis can be used to distinguish internal vs. external driving of the system. Both mutual information and transfer entropy give a measure of shared information between two variables (or vectors). However, unlike mutual information, transfer entropy is highly directional. To illustrate, we apply mutual information, transfer entropy, and cumulant-based analysis to investigate the dynamics of the Dst index.

Our analysis with mutual information and transfer entropy indicates that there are strong linear and nonlinear correlations and transfer of information, respectively, in the forward direction between Vsw and Dst (Vsw Dst). However, mutual information indicates that there is also a strong correlation in the backward direction (Dst Vsw), which is puzzling and counterintuitive. In contrast, the transfer entropy indicates that there is no information transfer in the backward direction (DstVsw), as expected because it is the solar wind that drives the magnetosphere, not the other way around. The transfer of information from Vsw to Dst peaks at τ=8–11 h.

Using the cumulant-based significance, we have established that the underlying dynamics of Dst is in general nonlinear, exhibiting a quasiperiodicity which is detectable only if nonlinear correlations are taken into account. The strong nonlinear responses of Dst to VBs at τ=25, 50, and 90 h are likely related to multiple modes of relaxation of the ring current from multiple loss mechanisms following the commencement of storms. It is, of course, possible that these nonlinearities are caused by solar wind drivers other than VBs. However, the timing of these nonlinearities would put them well in the recovery phase of a storm, and previous studies suggested that the ring current decays in the recovery phase are strongly influenced by VBs . The nonlinearities at τ=3–12 h are not caused by internal dynamics but rather by the solar wind driver, which is similar to the timescale for the solar wind transport time from the dayside magnetopause to the inner magnetosphere. This timescale is consistent with the timescale for the information transfer from the solar wind to Dst obtained from transfer entropy analysis.

Although linear models are useful, our results indicate that these models have to be used with caution because the solar wind–magnetosphere system is inherently nonlinear. Hence, nonlinearities generally need to be taken into account in order to describe the system accurately. Local linear models (which include slow evolution of parameters) may be able to handle some nonlinearities, but it is expected that these local linear models would have difficulties if the dynamics suddenly and rapidly change.

Data availability
Data availability.

All the derived data products in this paper are available upon request by email (simon.wing@jhuapl.edu).

Competing interests
Competing interests.

The authors declare that they have no conflict of interest.

Acknowledgements
Acknowledgements.

Simon Wing acknowledges support from JHU/APL Janney Fellowship, NSF grant AGS-1058456, and NASA grants (NNX13AE12G, NNX15AJ01G, NNX16AR10G, and NNX16AQ87G). Jay R. Johnson acknowledges support from NASA grants (NNH11AR07I, NNX14AM27G, NNH14AY20I, NNX16AC39G), NSF grants (ATM0902730, AGS-1203299, AGS-1405225), and DOE contract DE-AC02-09CH11466. Enrico Camporeale is partially funded by the NWO Vidi grant no. 639.072.716. We thank James M. Weygand for the solar wind data processing. The raw solar wind data from ACE, Wind, ISEE1, and ISEE3 were obtained from NASA CDAW and NSSDC.
The topical editor, Georgios Balasis, thanks one anonymous referee for help in evaluating this paper.

References

Baker, D. N., Zwickl, R. D., Bame, S. J., Hones, E. W., Tsurutani, B. T., Smith, E. J., and Akasofu, S.-I.: An ISEE 3 high time resolution study of interplanetary parameter correlations with magnetospheric activity, J. Geophys. Res., 88, 6230, https://doi.org/10.1029/ja088ia08p06230, 1983. a

Balasis, G., Papadimitriou, C., Daglis, I. A., Anastasiadis, A., Athanasopoulou, L., and Eftaxias, K.: Signatures of discrete scale invariance in Dst time series, Geophys. Res. Lett., 38, L13103, https://doi.org/10.1029/2011GL048019, 2011. a

Balikhin, M. A., Boynton, R. J., Walker, S. N., Borovsky, J. E., Billings, S. A., and Wei, H. L.: Using the NARMAX approach to model the evolution of energetic electrons fluxes at geostationary orbit, Geophys. Res. Lett., 38, L18105, https://doi.org/10.1029/2011GL048980, 2011. a

Bargatze, L. F., Baker, D. N., Hones, E. W., and McPherron, R. L.: Magnetospheric impulse response for many levels of geomagnetic activity, J. Geophys. Res., 90, 6387–6394, 1985. a

Borovsky, J. E., Thomsen, M. F., and Elphic, R. C.: The driving of the plasma sheet by the solar wind, J. Geophys. Res., 103, 17617–17640, https://doi.org/10.1029/97JA02986, 1998. a, b

Burton, R. K., McPherron, R. L., and Russell, C. T.: An Emperical Relationship Between Interplanetary Conditions and Dst, J. Geophys. Res., 80, 4204–4214, 1975. a, b

Clauer, C. R., McPherron, R. L., Searls, C., and Kivelson, M. G.: Solar wind control of auroral zone geomagnetic activity, Geophys. Res. Lett., 8, 915–918, https://doi.org/10.1029/gl008i008p00915, 1981. a

Crooker, N. U. and Gringauz, K. I.: On the low correlation between long-term averages of solar wind speed and geomagnetic activity after 1976, J. Geophys. Res., 98, 59–62, https://doi.org/10.1029/92ja01978, 1993. a

Deco, G. and Schürmann, B.: Information Dynamics, Springer-Verlag, New York, 2000. a, b

De Michelis, P., Consolini, G., Materassi, M., and Tozzi, R.: An information theory approach to the storm-substorm relationship, J. Geophys. Res.-Space, 116, A08225, https://doi.org/10.1029/2011JA016535, 2011. a

Dessler, A. J. and Parker, E. N.: Hydromagnetic theory of geomagnetic storms, J. Geophys. Res., 64, 2239–2252, https://doi.org/10.1029/JZ064i012p02239, 1959. a

Friedel, R. H. W., Korth, H., Henderson, M. G., Thomsen, M. F., and Scudder, J. D.: Plasma sheet access to the inner magnetosphere, J. Geophys. Res.-Space, 106, 5845–5858, https://doi.org/10.1029/2000ja003011, 2001. a

Gershenfeld, N.: The Nature of Mathematical Modeling, Cambridge University Press, Cambridge, 1998. a

Hamilton, D., Gloeckler, G., Ipavich, F., Stüdemann, W., Wilken, B., and Kremser, G.: Ring current development during the great geomagnetic storm of February 1986, J. Geophys. Res.-Space, 93, 14343–14355, 1988. a, b

Johnson, J. R. and Wing, S.: A solar cycle dependence of nonlinearity in magnetospheric activity, J. Geophys. Res., 110, A04211, https://doi.org/10.1029/2004ja010638, 2005. a, b, c, d, e

Johnson, J. R. and Wing, S.: External versus internal triggering of substorms: An information-theoretical approach, Geophys. Res. Lett., 41, 5748–5754, https://doi.org/10.1002/2014gl060928, 2014. a

Johnson, J. R. and Wing, S.: The dependence of the strength and thickness of field-aligned currents on solar wind and ionospheric parameters, J. Geophys. Res.-Space, 120, 3987–4008, https://doi.org/10.1002/2014ja020312, 2015. a

Kennel, M. B. and Isabelle, S.: Method to Distinguish Possible Chaos from Colored Noise and to Determine Embedding Parameters, Phys. Rev. A, 46, 3111–3118, 1992. a

Kissinger, J., McPherron, R. L., Hsu, T.-S., and Angelopoulos, V.: Steady magnetospheric convection and stream interfaces: Relationship over a solar cycle, J. Geophys. Res.-Space, 116, A00I19, https://doi.org/10.1029/2010ja015763, 2011. a

Klimas, A. J., Vassiliadis, D., and Baker, D. N.: Dst index prediction using data-derived analogues of the magnetospheric dynamics, J. Geophys. Res., 103, 20435–20448, 1998. a

Kozyra, J., Liemohn, M., Clauer, C., Ridley, A., Thomsen, M., Borovsky, J., Roeder, J., Jordanova, V., and Gonzalez, W.: Multistep Dst development and ring current composition changes during the 4–6 June 1991 magnetic storm, J. Geophys. Res.-Space, 107, SMP 33-1-SMP 33-22, https://doi.org/10.1029/2001JA000023, 2002. a, b

Li, W.: Mutual information functions versus correlation functions, J. Stat. Phys., 60, 823, https://doi.org/10.1007/BF01025996, 1990. a

Materassi, M., Ciraolo, L., Consolini, G., and Smith, N.: Predictive Space Weather: An information theory approach, Adv. Space Res., 47, 877–885, https://doi.org/10.1016/j.asr.2010.10.026, 2011. a

Materassi, M., Consolini, G., Smith, N., and De Marco, R.: Information theory analysis of cascading process in a synthetic model of fluid turbulence, Entropy, 16, 1272–1286, 2014. a

Mcpherron, R. L. and O'Brien, P.: Predicting Geomagnetic Activity: The DstIndex, in: Space Weather, edited by: Song, P., Singer, H. J., and Siscoe, G. L., https://doi.org/10.1029/GM125p0339, 2001. a, b

Newell, P., Liou, K., Gjerloev, J., Sotirelis, T., Wing, S., and Mitchell, E.: Substorm probabilities are best predicted from solar wind speed, J. Atmos. Sol.-Terr. Phy., 146, 28–37, https://doi.org/10.1016/j.jastp.2016.04.019, 2016. a

O'Brien, T. P. and McPherron, R. L.: An empirical phase space analysis of ring current dynamics: Solar wind control of injection and decay, J. Geophys. Res., 105, 7707–7720, 2000. a, b

Papitashvili, V. O., Papitashvili, N. E., and King, J. H.: Solar cycle effects in planetary geomagnetic activity: Analysis of 36-year long OMNI dataset, Geophys. Res. Lett., 27, 2797–2800, https://doi.org/10.1029/2000gl000064, 2000. a

Prichard, D. and Theiler, J.: Generalized redundancies for time series analysis, Phys. D, 84, 476–493, https://doi.org/10.1016/0167-2789(95)00041-2, 1995. a

Reeves, G. D., Morley, S. K., Friedel, R. H. W., Henderson, M. G., Cayton, T. E., Cunningham, G., Blake, J. B., Christensen, R. A., and Thomsen, D.: On the relationship between relativistic electron flux and solar wind velocity: Paulikas and Blake revisited, J. Geophys. Res.-Space, 116, A02213, https://doi.org/10.1029/2010ja015735, 2011. a

Schreiber, T.: Measuring Information Transfer, Phys. Rev. Lett., 85, 461–464, https://doi.org/10.1103/PhysRevLett.85.461, 2000. a, b

Schreiber, T. and Schmitz, A.: Improved Surrogate Data for Nonlinearity Tests, Phys. Rev. Lett., 77, 635–639, 1996. a

Smith, P. H., Hoffman, R. A., and Fritz, T. A.: Ring current proton decay by charge exchange, J. Geophys. Res., 81, 2701–2708, https://doi.org/10.1029/JA081i016p02701, 1976. a

Strangeway, R., Ergun, J. R. E., Su, Y.-J., Carlson, C. W., and Elphic, R. C.: Factors controlling ionospheric outflows as observed at intermediate altitudes, J. Geophys. Res., 110, A03221, https://doi.org/10.1029/2004ja010829, 2005. a

Sturges, H. A.: The choice of class interval, J. Am. Stat. Assoc., 21, 65–66, https://doi.org/10.1080/01621459.1926.10502161, 1926. a

Tsurutani, B. T., Sugiura, M., Iyemori, T., Goldstein, B. E., Gonzalez, W. D., Akasofu, S. I., and Smith, E. J.: The nonlinear response of AE to the IMF Bs driver: A spectral break at 5 hours, Geophys. Res. Lett., 17, 279–282, 1990. a

Valdivia, J. A., Rogan, J., Muñoz, V., Toledo, B. A., and Stepanova, M.: The magnetosphere as a complex system, Adv. Space Res., 51, 1934–1941, https://doi.org/10.1016/j.asr.2012.04.004, 2013. a

Vassiliadis, D. V., Sharma, A. S., Eastman, T. E., and Papadopoulos, K.: Low-dimensional chaos in magnetospheric activity from AE time series, Geophys. Res. Lett., 17, 1841–1844, 1990. a

Weimer, D. R., Ober, D. M., Maynard, N. C., Collier, M. R., McComas, D. J., Ness, N. F., Smith, C. W., and Watermann, J.: Predicting interplanetary magnetic field (IMF) propagation delay times using the minimum variance technique, J. Geophys. Res., 108, 1026, https://doi.org/10.1029/2002ja009405, 2003.  a

Weygand, J. M. and McPherron, R. L.: Dependence of ring current asymmetry on storm phase, J. Geophys. Res.-Space, 111, A11221, https://doi.org/10.1029/2006JA011808, 2006. a, b, c

Wing, S. and Johnson, J. R.: Theory and observations of upward field-aligned currents at the magnetopause boundary layer, Geophys. Res. Lett., 42, 9149–9155, https://doi.org/10.1002/2015gl065464, 2015. a

Wing, S., Johnson, J. R., Jen, J., Meng, C.-I., Sibeck, D. G., Bechtold, K., Freeman, J., Costello, K., Balikhin, M., and Takahashi, K.: Kp forecast models, J. Geophys. Res., 110, A04203, https://doi.org/10.1029/2004ja010500, 2005. a

Wing, S., Johnson, J. R., Camporeale, E., and Reeves, G. D.: Information theoretical approach to discovering solar wind drivers of the outer radiation belt, J. Geophys. Res.-Space, 121, 9378–9399, https://doi.org/10.1002/2016ja022711, 2016. a, b, c, d, e, f

Wing, S., Johnson, J. R., and Vourlidas, A.: Information Theoretic Approach to Discovering Causalities in the Solar Cycle, Astrophys. J., 854, 2, https://doi.org/10.3847/1538-4357/aaa8e7, 2018. a, b, c