Relationships between time series are often studied on the basis of cross-correlation coefficients and regression equations. This approach is generally incorrect for time series, irrespective of the cross-correlation coefficient value, because relations between time series are frequency-dependent. Multivariate time series should be analyzed in both time and frequency domains, including fitting a parametric (preferably, autoregressive) stochastic difference equation to the time series and then calculating functions of frequency such as spectra and coherent spectra, coherences, and frequency response functions. The example with a bivariate time series “Atlantic Multidecadal Oscillation (AMO) – sea surface temperature in Niño area 3.4 (SST3.4)” proves that even when the cross correlation is low, the time series' components can be closely related to each other. A full time and frequency domain description of this bivariate time series is given. The AMO–SST3.4 time series is shown to form a closed-feedback loop system with a 2-year memory. The coherence between AMO and SST3.4 is statistically significant at intermediate frequencies where the coherent spectra amount up to 55 % of the total spectral densities. The gain factors are also described. Some recommendations are offered regarding time series analysis in climatology.

Studying relations between time series on the basis of observations is a
common task in all branches of Earth sciences. Normally, it requires getting
quantitative answers to the following questions:

What is the optimal time-domain stochastic model for a given multivariate time series?

Which components of the time series could be regarded as inputs and outputs of a respective hydrometeorological system?

Is there any interaction between the inputs and the outputs (are there any closed-feedback loops within the system)?

What are the statistical properties of the multivariate time series in the time and frequency domains?

Note first that the linear correlation/regression approach as a means of studying relations between scalar time series, including teleconnections within the climatic system, is generally inapplicable to time series analysis. The simplest example given in Privalsky and Jensen (1995) and repeated in Emery and Thomson (2004) is a zero cross-correlation coefficient between two white noise sequences, one of which is obtained by applying a shift operator to the other. A low correlation coefficient may occur between any time series related to each other through more complicated but still strictly linear transformations. In particular, it can be a time series and its first difference, or any autoregressive–moving-average (ARMA) time series and its innovation sequence, or the time series at the input and output of a linear filter. The general statement is that if a time series is obtained from another time series through a linear inertial transformation, the correlation coefficient between them will not be equal to 1 in spite of the strictly linear dependence between them.

Relations between two time series (say,

The linear regression equation

The problem is solved if one uses methods of time series analysis including simultaneous description of multivariate time series in the time and frequency domains. It means fitting a stochastic difference equation to the time series, analyzing its properties in the time domain and then calculating and analyzing functions that describe the time series in the frequency domain. For a number of considerations (see below), the approach used here will be limited to the autoregressive (AR) case. Also, we will only be regarding the bivariate case. The extension to higher dimensions is rather simple (e.g., Bendat and Piersol, 1966; Robinson, 1967) and will briefly be described at the end of this section.

Let the bivariate time series

In the time domain, the time series is described with a stochastic
difference equation

Properties of the time series

The importance of the coherence function in time series analysis and modeling is illustrated with the following property. If the components of an ergodic bivariate time series present processes at the input and output of any linear time-invariant system, the coherence between them will be equal to 1 at all frequencies where the spectral density is not too close to zero.

The spectral matrix (Eq. 4) describes a linear stochastic system with the time
series

In the general case of an

It should be noted that if the multivariate time series is long (by orders of magnitude longer than the largest timescale of interest) and if the spectra of its components are intricate, the above-described approach may not be the best, especially in the time domain – because of the high order of the optimal stochastic difference equation. In such cases the analysis may have to be limited to a frequency domain description of the time series by using a parametric or nonparametric (e.g., Percival and Walden, 1993; Bendat and Piersol, 2010) approach. However, this can hardly happen if one is interested in properties of time series at climatic timescales.

The example with actual climatic data given below proves that the components of a bivariate time series can be connected to each other in spite of the fact that the cross-correlation coefficient between them is low. It also provides a simultaneous description of a climatic system in both time and frequency domains. The case with a high cross-correlation coefficient between the components of the ENSO system (Southern Oscillation Index and SST variations in the Niño area 3.4) has been treated in detail in Privalsky and Muzylev (2013) where it was shown, in particular, that both time series are close to white noise, interact with each other mostly through the innovation sequence, and that the coherence function, coherent spectra, and the frequency response functions between SOI and SST are frequency-dependent.

The El Niño–Southern Oscillation (ENSO) system is believed to affect
many phenomena in the Earth's climate (e.g., Philander, 1989). We will construct
an autoregressive model of the bivariate time series

As seen from the figure, the two components behave in a different manner: AMO contains much stronger low-frequency variations than SST3.4. The correlation between AMO and SST3.4 (Fig. 1b) is very low, with the cross-correlation coefficient 0.06 and the maximum absolute values of the cross-correlation function below 0.26. With the cross-correlation-based approach that prevails in climatology, the conclusion would have to be that the two scalar time series are either not related to each other at all or that the connection between them is very weak. And it would not be correct.

Observed AMO and SST3.4, 1876–2014

Consider first the time-domain properties. All four above-mentioned order
selection criteria selected the same order

Obviously, Eq. (9) describes a closed-feedback loop system: AMO
(

The stochastic difference (Eq. 9) and the innovation sequence
covariance matrix (Eq. 10) allow one to understand how much of the variances of
AMO and SST3.4 can be explained by the “deterministic” components of the
model (Eq. 9) that describes the dependence of

The SST3.4 variance

Coherent spectra: contribution of SST3.4 to the AMO spectrum

Both AMO and SST3.4 time series can be regarded as Gaussian so that their
autoregressive spectral estimates satisfy the requirements of the maximum
entropy spectral analysis. The AMO spectrum

Though the cross-correlation coefficient between AMO and SST3.4 is very low,
the maximum entropy estimate of coherence

The coherence between AMO and SST3.4 is weak at the low-frequency end, where AMO's spectral energy is much higher than elsewhere. The high coherence occurs at intermediate frequencies where the spectral density of AMO is much lower. The strong dependence of AMO on its past values and the relative closeness of the SST3.4 spectrum to a constant seem to be the reasons why the stochastic model (Eq. 9) can explain so much of the total AMO variance and less of the total SST3.4 variance.

Gain factors SST3.4–AMO

The contribution of SST3.4 to the AMO spectrum is

This example also shows that using proper methods of analysis allows one to avoid filtering of time series in order to suppress “noise”. Indeed, though the low-frequency variations dominate the spectrum of AMO, the coherence function has revealed the “signal” – a teleconnection between AMO and SST3.4 at intermediate frequencies where the AMO spectrum is low. This is another useful property of autoregressive time- and frequency-domain models.

Our frequency-domain results generally agree with the earlier results by
Park and Dusek (2013) regarding the connection between AMO and the
Multivariate ENSO Index (MEI) at intermediate frequencies. The authors used
a nonparametric spectral estimation–singular spectrum analysis keeping
10 first empirical orthogonal functions that cover slightly over 75 % of the
time series total variances. At frequencies above 0.15 yr

The phase factors in this case cannot give explicit information about the
AMO–SST3.4 system because its feedback loop is closed (interaction
between AMO and SST3.4). We cannot compare our spectra with those shown in
Park and Dusek (2013) because their spectral estimates are given without
confidence bounds but generally the shapes of the spectra at frequencies
below 0.5 yr

Relations between time series should not be studied on the basis of
cross-correlation coefficients and regression equations. An efficient approach
within the framework of time series analysis includes two stages both involving
parametric (preferably, autoregressive) modeling:

fitting a stochastic difference equation to the time series (time domain), analyzing the selected model, and

using the fitted equation to calculate and analyze frequency-domain characteristics (spectra and coherent spectra, coherence function(s), gain and phase factor(s)).

Methods of multivariate time series analysis should be used in all cases, irrespective of the value of the cross-correlation coefficient. The cross-correlation coefficients and regression equation do not generally describe relations between time series. In particular, a low cross-correlation coefficient does not necessarily mean the lack of even a strictly linear dependence between the time series.

The stochastic difference equation (Eq. 9) with the innovation sequence covariance matrix (Eq. 10) shows quantitatively that AMO and SST3.4 interact with each other so that AMO and SST3.4 can be regarded as either inputs or outputs to the AMO–SST3.4 system. It also reveals that the system's memory extends for 2 years. The dependence of AMO and SST3.4 upon their own past and upon the past of the other time series explains about 60 and 25 % of the AMO and SST3.4 variances, respectively.

The frequency-domain analysis of the system shows that the spectra of AMO and SST3.4 behave in a different manner, with the AMO spectrum decreasing fast with frequency and with a relatively flat SST3.4 spectrum.

In spite of the very low cross-correlation coefficient between the time
series of AMO and SST3.4, a close linear dependence between them was shown to
exist at intermediate frequencies corresponding to timescales from 3 to 5 years.
The coherence between AMO and SST3.4 is statistically significant in a wide
frequency band centered at 0.27 yr

The coherent spectra AMO–SST3.4 and SST3.4–AMO are statistically
significant at frequencies from 0.18 to 0.38 yr

Climatic time series can often be treated as Gaussian. The ability to use a Gaussian
approximation is important because for such time series the nonlinear
approach cannot give better results than what is obtained within the linear
approximation. This latter statement holds, in particular, for time series
extrapolation within the framework of the Kolmogorov–Wiener theory. Also, as
shown by Choi and Cover (1984), the random process that maximizes the
entropy rate under constraints on

Many time series, especially at climatic timescales, are short; that is, their lengths do not exceed the timescales of interest by orders of magnitude. Therefore, the nonparametric analysis in the frequency domain may not be efficient because, with short time series, it would produce less reliable results. Besides, the nonparametric approach does not allow one to obtain explicit stochastic models in the time domain. (These are two more reasons to prefer the parametric modeling.)

A parametric (first of all, autoregressive) analysis in time and frequency
domains is effective because it results in relatively accurate estimates due
to the postulation of a stochastic model for the time series. In particular,
the frequency-domain estimates obtained with the properly selected order
satisfy, in the Gaussian case, the requirements of the maximum entropy
spectral analysis. However, it is not correct to say that any autoregressive
spectral estimate has this important property. The number of parameters to
be estimated should always be much smaller than the time series length. If,
for example, the length