The ScaLIng Macroweather Model ( SLIMM ) : using scaling to forecast global-scale macroweather from months to decades

On scales of ≈ 10 days (the lifetime of planetary-scale structures), there is a drastic transition from high-frequency weather to low-frequency macroweather. This scale is close to the predictability limits of deterministic atmospheric models; thus, in GCM (general circulation model) macroweather forecasts, the weather is a high-frequency noise. However, neither the GCM noise nor the GCM climate is fully realistic. In this paper we show how simple stochastic models can be developed that use empirical data to force the statistics and climate to be realistic so that even a two-parameter model can perform as well as GCMs for annual global temperature forecasts. The key is to exploit the scaling of the dynamics and the large stochastic memories that we quantify. Since macroweather temporal (but not spatial) intermittency is low, we propose using the simplest model based on fractional Gaussian noise (fGn): the ScaLIng Macroweather Model (SLIMM). SLIMM is based on a stochastic ordinary differential equation, differing from usual linear stochastic models (such as the linear inverse modelling – LIM) in that it is of fractional rather than integer order. Whereas LIM implicitly assumes that there is no lowfrequency memory, SLIMM has a huge memory that can be exploited. Although the basic mathematical forecast problem for fGn has been solved, we approach the problem in an original manner, notably using the method of innovations to obtain simpler results on forecast skill and on the size of the effective system memory. A key to successful stochastic forecasts of natural macroweather variability is to first remove the lowfrequency anthropogenic component. A previous attempt to use fGn for forecasts had disappointing results because this was not done. We validate our theory using hindcasts of global and Northern Hemisphere temperatures at monthly and annual resolutions. Several nondimensional measures of forecast skill – with no adjustable parameters – show excellent agreement with hindcasts, and these show some skill even on decadal scales. We also compare our forecast errors with those of several GCM experiments (with and without initialization) and with other stochastic forecasts, showing that even this simplest two parameter SLIMM is somewhat superior. In future, using a space–time (regionalized) generalization of SLIMM, we expect to be able to exploit the system memory more extensively and obtain even more realistic forecasts.


Introduction
Due to their sensitive dependence on initial conditions, the classical deterministic prediction limit of GCMs (general circulation models) is about 10 days -the lifetime of planetarysized structures (τ w ).Beyond this, the forecast weather rapidly loses any relationship with the real weather.The anal-ogous scale (τ ow ) for near-surface ocean gyres is about 1 year (Lovejoy and Schertzer, 2012b), so that even the ocean component -important in fully coupled climate models (referred to simply as GCMs below) -is poorly forecast beyond this.When using long GCM runs for making climate forecasts, we are therefore really considering a boundary value problem rather than an initial value problem (Bryson, 1997).
Published by Copernicus Publications on behalf of the European Geosciences Union.
For these longer scales, following Hasselmann (1976), the high-frequency weather can be considered as a noise driving an effectively stochastic low-frequency system; the separation of scales needed to justify such modelling is provided by the drastic transitions at τ w , τ ow .In the atmosphere, the basic phenomenology behind this has been known since the earliest atmospheric spectra (Panofsky and Van der Hoven, 1955) and was variously theorized as the "scale of migratory pressure systems of synoptic weather map scale" ( Van der Hoven, 1957) and as the "synoptic maximum" (Kolesnikov and Monin, 1965).Later, it was argued to be a transition scale of the order of the lifetime of planetary structures that separated different high-frequency and low-frequency scaling regimes (Lovejoy and Schertzer, 1986).More recently, based on the solar-induced energy rate density, the atmospheric scale τ w was deduced theoretically from turbulence theory (Lovejoy and Schertzer, 2010), and τ ow was derived in Lovejoy and Schertzer (2012b).The same basic picture was also confirmed in the Martian atmosphere in Lovejoy et al. (2014), including a correct prediction of the lowand high-frequency spectral exponents and Martian transition scale τ M w (= 1.8 sols).Although it is only plausible at midlatitudes, the competing theory from dynamical meteorology postulates that the transition scale τ w is the typical scale of baroclinic instabilities (Vallis, 2010); see the critique in Lovejoy and Schertzer (2013, ch. 8).
Independent of its origin, the transition justifies the idea that the weather is essentially a high-frequency noise driving a lower-frequency climate system, and the idea is exploited in GCMs with long integrations as well as in Hasselmanntype stochastic modelling, now often referred to as linear inverse modelling (LIM; sometimes also called the "stochastic linear forcing" paradigm), e.g.Penland and Sardeshmuhk (1995), Newman et al. (2003), Sardeshmukh and Sura (2009); analogous modelling is also possible on much longer timescales using energy balance models.For a review, see Dijkstra (2013); for a somewhat different Hasselmanninspired approach, see Livina et al. (2013).
In these phenomenological models, the system is regarded as a multivariate Ohrenstein-Uhlenbeck (OU) process.The basic LIM paradigm is based on the stochastic differential equation where T is the temperature, ω w = τ −1 w is the "weather frequency", σ γ is the amplitude of the forcing and γ (t) is a "δcorrelated" Gaussian white noise forcing with γ (t)γ (s) = δ(t − s); γ (t) = 0. (2) Angle brackets indicate ensemble averaging and δ(t − s) is the Dirac function; t and s are two different times.This uses the convenient physics notation for the generalized function γ (t); alternatively one may take γ (t) dt = dW , where W is a Wiener process.
Fourier transforming Eq. ( 1) and using the rule F. T.[ d n f dt n ] = (iω) n F. T.[f ], where F. T. indicates Fourier transform, the temperature spectrum is thus where ω is the frequency, the tilde indicates Fourier transform, and, at respectively low and high frequencies, E T (ω) ≈ ω −β with β l = 0, β h = 2.A spatial LIM model (for regional forecasting) is obtained by considering a vector each of whose components is the temperature (or other atmospheric field) at different (spatially distributed) "pixels", yielding a system of linear stochastic ordinary differential equations of integer order.A system with 20 degrees of freedom (involving > 100 empirical parameters) currently somewhat outperforms GCMs for global-scale annual temperature forecasts (Newman (2013); Table 2, Fig. 2).
The basic problem with the LIM approach is that although we are interested in the low-frequency behaviour, for LIM models it is simply white noise and this has no memory (put d/dt = 0 in Eq. 1); by hypothesis, LIM models therefore assume a priori that there is no long-term predictability.However, ever since Lovejoy and Schertzer (1986), there has been a growing literature (Koscielny-Bunde et al., 1998;Huybers and Curry, 2006;Blender et al., 2006;Franzke, 2012;Rypdal et al., 2013;Yuan et al., 2014, and see the extensive review in Lovejoy and Schertzer, 2013) showing that the temperature (and other atmospheric fields) are scaling at low frequencies, with spectra significantly different than those of Orenstein-Uhlenbeck processes, notably with β l in the range of 0.2-0.8 with the corresponding low-frequency weather regime (at scales longer than τ w ≈ 10 days) now being referred to as "macroweather" (Lovejoy, 2013).At a theoretical level, for regional forecasting, a further shortcoming of the LIM approach is that it does not respect the property of space-time statistical factorization (Lovejoy and Schertzer, 2013, ch. 10;Lovejoy and de Lima, 2015).
While the difference in the value of β l might not seem significant, the LIM white noise value β l = 0 has no low-frequency predictability, whereas the actual values 0.2 < β l < 0.8 (depending mostly on the land or ocean location) correspond to potentially enormous predictability (see, e.g., Fig. 1a-e).Although this basic feature of "long-range statistical dependency" has been regularly pointed out in the scaling literature and an attempt was already made to exploit it (Baillie and Chung, 2002b; see below), the actual extent of this enhanced predictability has not been quantified before now (see, however, Yuan et al., 2014).This justifies the development of the new ScaLIng Macroweather Model (SLIMM) that we present below.We argue that, even in its simplest two parameter version, it already is comparable toor better than -GCMs.(c) This illustrates the potentially huge memory in the climate system (especially the ocean).It gives the λ mem value such that S k,λ mem (1)/S k,∞ (1) = 0.9.When H = −1/2, there is no memory and λ mem is not defined; it diverges when H → 0. (d) The theoretical skills for hindcasts with infinite (Eq.46) and finite memory (Eq.49) for the empirically relevant parameter range (H = −0.23,brown, bottom; H = −0.17,red, top).The flat (constant skill) at the top are the curves for the anomaly forecasts (i.e. with forecast horizon, t is equal to the resolution τ so that λ = 1); the bottom curves are for constant resolution τ with forecast horizon.In each case a triplet of curves is shown corresponding to varying lengths of memories used in the forecast: infinite, 180 and 20 (the latter two corresponding to those used for the monthly and global forecasts analysed here).(e) The skill of λ = t/τ = 1 forecasts using the full memory (black, Eq. 46, from a), the corresponding classical persistence forecast (red), S k = 1-4 (1-2 2H ) and the corresponding "enhanced persistence" result (blue; for this λ = 1 case, this is the same as the AR(1) (autoregressive order 1) model forecast) with S k = (2 2H +1 − 1) 2 .With classical persistence the skill becomes negative for H < ≈ −0.2, so it is not shown over the whole range.  2 Stochastic models and fractional Gaussian noise

Linear and nonlinear stochastic atmospheric models
We have discussed the phenomenological linear stochastic models introduced in atmospheric science by Hasselmann and others from 1976 onwards.Yet there is an older tradition of stochastic atmospheric modelling that can be traced back to the 1960s: stochastic cascade models for turbulent intermittency (Novikov and Stewart, 1964;Yaglom, 1966;Mandelbrot, 1974;Schertzer and Lovejoy, 1987).Significantly, these models are nonlinear rather than linear, and the nonlinearity plays a fundamental role in their ability to realistically model intermittency.By the early 1980s it was realized that these multiplicative cascades were the generic multifractal processes, and they were expected to be generally relevant in high-dimensional nonlinear dynamical systems that were scale invariant over some range.By 2010, there was a considerable body of work showing that atmospheric cascades were anisotropic -notably with different scaling in the horizontal and vertical directions (leading to anisotropic, stratified cascades) -and that this enabled cascades to operate up to planetary sizes (see the reviews Lovejoy andSchertzer, 2010, 2013).While the driving turbulent fluxes were modelled by pure cascades, the observables (temperature, wind, etc.) were modelled by fractional integrals of the latter (see below): the Fractionally Integrated Flux (FIF) model.The analysis of in situ (aircraft, dropsonde) and remotely sensed data, reanalyses as well as weather forecasting models showed that at least up to 5000 km, the cascade processes were remarkably accurate, with statistics (up to second order) typically showing deviations of less than ≈ ±0.5 % with respect to the theoretical predictions (see Lovejoy and Schertzer (2013, ch. 4) for an empirical review).
The success of the cascade model up to planetary scales (L w ) showed that the horizontal dynamics were dominated by the solar-induced energy flux (ε ≈ 10 −3 W kg −1 sometimes called the "energy rate density"), and it implies a break in the space-time cascades at about τ w = ε −1/3 L 2/3 w ≈ 10 days discussed above.The logical next question was therefore what happens if the model is extended in time and the cascade starts on an outer timescale much longer than τ w ?In Lovejoy and Schertzer (2013, Appendix 10A), some of the mathematical details of this Extended Fractionally Integrated Flux (EFIF) model were worked out, and it was shown that at frequencies below τ −1 w there would a nonintermittent (near-) Gaussian, (near-) scaling regime with generic exponents β l in the observed range.
Although this (temporally) extended space-time cascade model well reproduces the basic space-time weather statistics (for scales < τ w ) and the temporal macroweather statistics (for scales > τ w ), by itself it was not able to reproduce the spatial macroweather statistics that characterize climate zones and that were strongly intermittent, so that another, even lower-frequency climate process was necessary.(In quantitative terms, empirically, the basic intermittency parameter C 1 that characterizes the intermittency near the mean is typically low -around 0.01-0.02-in time, whereas it is typically high -around 0.15-0.2-in space.)It was proposed that -following the basic mathematical structure of the rest of the model -the new climate process was also multiplicative in nature.This factorization hypothesis was empirically verified on macroweather temperature and precipitation data (Lovejoy and Schertzer (2013, ch. 10) and Lovejoy and de Lima (2015) respectively).
To summarize, there are three key empirically observed macroweather characteristics that models should respect: low temporal intermittency, high spatial intermittency and statistical space-time factorization.According to the analysis in Lovejoy and de Lima (2015), the CEFIF (Climate EFIF model) approximately satisfies these properties but has some disadvantages.A practical difficulty is that it -much like GCMs -requires the explicit modelling of fine temporal (weather-scale) resolution.This is computationally wasteful since for macroweather modelling, the high frequencies are subsequently averaged out in order to model the lowerfrequency macroweather.An arguably more significant disadvantage is that CEFIFs theoretical properties -including its predictability -are nontrivial and are largely unknown.
SLIMM is an attempt to directly model space-time macroweather while respecting the factorization property and while using the comparatively simple, nonintermittent scaling process -fractional Gaussian noise (fGn) -to repro-duce the low-intermittency temporal behaviour.In the temporal domain, it is thus based on a linear stochastic model (fGn) with reasonably well-understood predictability properties and predictability limits.The strong spatial macroweather variability can be modelled either by using multifractal spatial variability (representing very low-frequency climate processes), or alternatively -in the spirit of LIM modellingit can be modelled as a system of (fractional-order) ordinary differential equations.In the former case, developed in Lovejoy and de Lima (2015), it turns out to be sufficient to take the product of a spatially nonlinear (multifractal) stochastic model, with a space-time fGn process.The result is a model that is well defined at arbitrary spatial resolutions and with temporal scaling exponents that are the same at every spatial location (this restriction is somewhat unrealistic).In the latter LIM-like case, one fixes the grid scale (the spatial resolution) and then treats each grid point as a component of an N component system of (fractional) ordinary differential equations.In this version of space-time SLIMM, each grid point can have a different temporal scaling exponent corresponding to a different fractional order of differentiation.Although the result is formally closer to the LIM model (albeit with radically different predictability properties), it has the disadvantage that the model properties are not well defined under changes in spatial resolution -they potentially depend strongly on the grid that is used for the spatial discretization.As a final comment, we note that empirically, it is found that macroweather temperature probability distributions have "fat tails" so that statistical moments of the order of ≈ 5 diverge (Lovejoy and Schertzer, 2013, ch. 5;Lovejoy, 2014bLovejoy, , 2015b; see also Lovejoy and Schertzer (1986).However, for the (low-order) statistics (e.g.near the mean and variancefirst and second order), the deviations from Gaussianity are small enough that fGn can be used as an approximation.

From LIM to SLIMM
In this paper, we concentrate on the simplest scalar SLIMM, and we illustrate this by hindcasting global-scale temperature series.The key change to the LIM model is thus a modification of the low-frequency scaling: rather than β l = 0 (white noise), the SLIMM has 1 > β l > 0. This can be effected by a simple extension of Eq. (1) to yield the fractional differential equation where H + 1/2 is a fractional order of differentiation.Using [f ]; this yields the temperature spectrum: Hence, the low-and high-frequency SLIMM exponents are β l = 2H + 1, β h = 2H + 3. Note that for the global temper-ature series analysed below, we have β l ≈ 0.6 and H ≈ −0.2 (see Fig. 4a and b).Alternatively, Eq. ( 4) can be solved in real space directly.First, operate on both sides of the above by (ω w + d dt ) −1 to obtain Since the autocorrelation of γ s is we see that, for lags t ω −1 w , γ s is essentially an uncorrelated white noise: γ s is simply γ smoothed over timescales shorter than τ w = ω −1 w .If we are only interested in frequencies lower than ω w , we can introduce the white noise smoothed at resolution τ and simply solve the following: The LIM paradigm is recovered as the special case with H = −1/2.Although physically, the weather scales are responsible for the smoothing at τ = τ w , in practice, we typically have climate data averaged at even lower resolutions: for example monthly or annually.Therefore, it is simpler to consider a "pure" process (with pure white noise forcing γ rather than the smoothed γ τ ) and then introduce the resolution and/or smoothing simply as an averaging procedure.Formally, the solution to Eq. ( 8) with τ = 0 is obtained by Riemann-Liouville fractional integration of both sides of the equation by order H + 1/2: ( is the gamma function.)T (t) is a "fractional Gaussian noise" (fGn) process.By inspection, the statistics are invariant under translations in time (t → t + t) so that this process is stationary.Although basic processes of this type were first introduced by Kolmogorov (1940), since Mandelbrot and Van Ness (1968), the usual order-1 integral of Eq. ( 9) has received most of the mathematical attention and is known as "fractional Brownian motion" (fBm).An interesting mathematical feature of fBm and fGn is that they are not semi-Martingales and hence the standard stochastic Itô and Stratatovitch calculi do not apply (see Biagini et al. (2008) for a recent mathematical review).In the present case, this is not important since we only deal with Wiener integrals (i.e.integrals of deterministic functions with respect to fGn).The FIF While below we use simple averaging to obtain smallscale convergence of fGn, for many purposes, the details of the smoothing at resolution τ are unimportant and it can be useful to define the particularly simple "truncated fGn" process: where the singular kernel is truncated at scale τ .It can be shown that for large enough lags t, the fluctuation and autocorrelation statistics for truncated fGn are the same as for the averaged fGn, although, when H approaches 0 (from below), the convergence of the former to the latter becomes increasingly slow.In practice, the truncated model is often a convenient approximation to the slightly more complex averaged fGn process.

Definition and links to fBm
Fractional Brownian motion has received far more attention than fractional Gaussian noise, and it is possible to deduce the properties of fGn from fBm.However, since we are exclusively interested in fGn, it is more straightforward to first define fGn and then -if needed -define fBm from its integral.
The canonical fractional Gaussian noise process G H (t) with parameter H , can be defined as where c H is a constant chosen so as to make the expression for the statistics particularly simple; see below.First, taking ensemble averages of both sides of Eq. ( 11), we find that the mean vanishes: G H,τ (t) = 0. Now, take the average of G H over a resolution τ , and define the function F H , which will be useful below, as (u is a dummy variable) with the particular value and the asymptotic expression for λ 1: If c H is now chosen such that then we have This shows that a fundamental property is that in the smallscale limit (τ → 0), the variance diverges and H is the scaling exponent of the root mean square (RMS) value.This singular small-scale behaviour is responsible for the strong power law resolution effects in fGn.Since < G H,τ (t) > = 0, we see that sample functions G H,τ (t) fluctuate about 0 with successive fluctuations tending to cancel each other out; this is the hallmark of the macroweather regime.
It is more common to treat fBm whose differential dB H (t) is given by with the property While this defines the increments of B H (t) and shows that they are stationary, it does not completely define the process.For this, one conventionally imposes B H (0) = 0, leading to the usual definition due to (Mandelbrot and Van Ness (1968) Whereas fGn has a small-scale divergence that can be eliminated by averaging over a finite resolution τ , the fGn inte- G H (t )dt on the contrary has a low-frequency divergence.This is the reason for the introduction of the second Earth Syst.Dynam., 6, 637-658, 2015 www.earth-syst-dynam.net/6/637/2015/term in the first integral in Eq. ( 21): it eliminates this divergence at the price of imposing B H (0) = 0 so that fBm is nonstationary (although its increments are stationary; Eq. 19).
A comment on the parameter H is now in order.In treatments of fBm, it is usual to use the parameter H confined to the unit interval, i.e. to characterize the scaling of the increments of fBm.However, fBm (and fGn) are very special scaling processes, and even in low-intermittency regimes such as macroweather, they are at best approximate models of reality.Therefore, it is better to define H more generally as the fluctuation exponent (see below); with this definition H is also useful for more general (multifractal) scaling processes although the interpretation of H as the "Hurst exponent" is only valid for fBm.When −1 < H < 0, the mean at resolution τ (Eq.12) defines the anomaly fluctuation (see below), so that H is equal to the fluctuation exponent for fGn; in contrast, for processes with 0 < H < 1, the fluctuations scale as the mean differences and Eq. ( 20) shows that H ' is the fluctuation exponent for fBm.In other words, as long as an appropriate definition of fluctuation is used, H and H = 1 + H are fluctuation exponents of fGn and fBm respectively.The relation H = H + 1 follows because fBm is an integral order 1 of fGn.Therefore, since the macroweather fields of interest have fluctuations with mean scaling exponent −1/2 < H < 0, we use H for the fGn exponent and 1/2 < H < 1 for the corresponding integrated fBm process.
Some useful relations are and valid for 0 < H < 1 and t 1 < t 2 ≤ t 3 < t 4 (e.g.Gripenberg and Norros, 1996).The relationship Eq. ( 23) can be used to obtain several useful relations for a finite resolution fGn.For example, A convenient expression for the special case at the fixed res- The nondimensional lag, i.e. measured in integer resolution units, is λ = t/τ .This is convenient since real data are discretized in time, and this shows that as long as we correct for the overall resolution factor (τ 2H ), that the autocorrelation only depends on the nondimensional lag.Since H < 0, the larget limit is The autocorrelation falls off algebraically with exponent 2H .

Spectrum and fluctuations
Since fGn is stationary, its spectrum is given by the Fourier transform of the autocorrelation function.The autocorrelation is symmetric (R H,τ ( t) = R H,τ (− t)), so that for the Fourier transform, we use the absolute value of t.Also, we must take the limit of the autocorrelation of small resolution, which is the same as using the large-λ formula (Eq.26).In this case, we obtain The relation between β and H is the standard monofractal one.It is valid as long as intermittency effects are negligible, i.e. if we ignore the multifractal "corrections".However, sometimes -as here for high-order statistical moments or in the case of precipitation even for low-order moments -these can give the dominant contribution to the scaling.The spectrum is one way of characterizing the variability as a function of scale (frequency); however, it is often important to have real space characterizations.These are useful not only for understanding the effects of changing resolution, but also on a given timescale t for studying the full range of variability (i.e.statistical moments other than second order, probability distributions, etc.).Wavelets provide a general framework for defining fluctuations; we now give some simple and useful special cases.

Anomalies
An anomaly is the average deviation from the long-term average, and since G H = 0, the anomaly fluctuation over time scales t > τ is simply G H at resolution t rather than τ : Hence, using Eq. ( 25), Anomaly fluctuations were referred to with the less intuitive term "tendency" fluctuation in Lovejoy and Schertzer (2012a).While this definition of fluctuation is fine for fGn, it is not appropriate for processes with H > 0 since these "wander", i.e they do not tend to return to any longterm value.We therefore need other definitions of fluctuation.
S. Lovejoy et al.: SLIMM: using scaling to forecast global-scale macroweather from months to decades

Differences
The classical fluctuation is simply the difference (the "poor man's wavelet"): Hence, In the larget limit we have Since H < 0, the differences asymptote to the value 2τ 2H (double the variance, Eq. 17).Notice that since H < 0, the differences do not scale with t.

Haar fluctuations
As pointed out in Lovejoy and Schertzer (2012a), the preceding fluctuations only have variances proportional to τ 2H over restricted ranges of H , specifically −1 ≤ H ≤ 0 (anomalies) and 0 ≤ H ≤ 1 (differences).A more generally useful fluctuation (used below) is the Haar fluctuation (from the Haar wavelet, Haar, 1910).This is defined as the differences between the average of the first and second halves of the interval t: Using Eq. ( 23), we obtain This indeed scales as t 2H and does not depend on the resolution τ .

Using fGn to model and forecast the temperature
Using the definition (Eq.11) of fGn, we can define the temperature as (i.e.σ T = σ γ /c H ). Let us now introduce the integral S(t): Since T is a fractional integral of the order 1/2 + H with respect to white noise, S(t) is a fractional integral of the order 3/2 + H = 1/2 + H '. Strictly speaking, the above integral diverges at −∞; however, this is not important since we will always take differences over finite intervals (equivalent to integrating T (t) over a finite interval) and the differences will converge.
We can therefore define the resolution τ temperature as Notice that because of the divergence of S(t) at −∞, we did not define S(t) = σ T B H (t).However, the differences do respect Using Eq. ( 35), the τ resolution temperature variance is thus From this and the relation T τ (t) = σ T G H,τ (t), we can trivially obtain the statistics of T τ (t) from those of G H,τ (t).

Forecasts
Since an fGn process at resolution τ is the average of the increments of an fBm, process, it suffices to forecast fBm.There are four important related problems in the prediction of fBm: to find the best forecast using (a) finite past data and (b) infinite past data.The cases (1) 0 < H < 1/2 and (2) 1/2 < H < 1 (with H = 1 + H ) must be considered separately.Since −1/2 < H < 0, our problem corresponds to cases 2a and b.Yaglom solved problem 1b in 1955 (Yaglom, 1955), Gripenburg and Norros (1996) solved 2a and b in 1996 and problem 1a was solved by Nuzman and Poor (2000).Hirchoren and Arantes (1998) used the Gripenburg and Norros results for 1/2 < H < 1 to numerically test the method adapted to fGn; see also Hirchoren and D'attellis (1998).Although the Gripenberg and Norros (1996) result conveniently expresses the fBm predictions at time t (the "forecast horizon") directly in terms of the past series for t ≤ 0, the corresponding formulae are not simple.
The standard approach that they followed yields nontrivial integral equations (which they solved) in both the finite-and infinite-data cases.In what follows, we use a more straightforward method -the general method of innovations (see, e.g., Papoulis, 1965, ch. 13) -and we obtain relatively simple results for the case with infinite past data (which is equivalent to the corresponding Gripenberg and Norros (1996) result).In a future publication we improve on this by adapting it to the finite-data case.The main new aspect of the forecasting problem with only finite data is that it turns out that not only do the most recent values (close to t = 0) have strong (singular) weighting, but the data in the oldest available data also have singular weightings.In the words of Gripenberg and Norros (1996), this is because they are the "closest witnesses" of the distant past.
We now derive the forecast result for resolution τ fGn using innovations assuming that data are available over the entire past (i.e. from t = −∞ to 0).Recall that the resolution τ temperature at time t is given by (t ≥ τ ≥ 0).We have used the fact that S(t) is a fractional integral of the order H + 3/2 of γ .Since the γ s are effectively independent random variables, they are called "innovations".If T τ (t) is known for t ≤ 0, then the above relation can be inverted to obtain γ (t) for t ≤ 0. If γ (t) is known for t ≤ 0, then the minimum square (MS) estimator (circonflex) at time t ≥ τ is given by which depends only on γ (t) for t ≤ 0. That this is indeed the MS estimator follows since the error E T in this estimator is orthogonal to the estimator.To see this, note that E T only depends on γ (t) for t ≥ 0: Since the range of integration for Tτ (t) in Eq. ( 40) is t < 0, whereas the range for the error E T (Eq.41) is t > 0, Tτ (t) and E T are clearly orthogonal: We can use this to obtain Using the substitution u = −(t − τ − t )/τ in the integral Eq. ( 41) and the function F H (λ) introduced in Eq. ( 13) and using Eq. ( 16) for c H , we obtain (F H (∞) is given in Eq. ( 14)).Using Eqs. ( 43) and ( 44), the error variance is Hence, if we define the "skill" (S k ) in terms of the error variance S k = 1 − E T (t, τ ) 2 / T τ (t) 2 , then for minimum square forecasts, it is also equal to the fraction of the variance that it explains: Figure 1a shows the theoretical skill as a function of H for different forecast horizons, and Fig. 1b shows it for different forecast horizons.In Fig. 1a, dashed reference lines indicate the three empirically significant values: land (H ≈ −0.3), global, (H ≈ −0.2), ocean (H ≈ −0.1).In Fig. 1b, the estimated global value (H = −0.20 ± 0.03; see below) is indicated in red.This definition of skill is slightly different from the root mean square skill score (RMSSS) that is sometimes used to evaluate GCMs (see, e.g., Doblas-Reyes et al., 2013).The RMSSS is defined as 1 minus the ratio of the RMS error of the ensemble-mean prediction divided by the RMS temperature variation: In our case, the forecast is orthogonal to the prediction so that (T − T ) 2 = T 2 − T 2 , and we obtain This shows that S k and RMSSS are more or less equivalent skill measures, both being in the range of 0 to 1.However, GCM forecasts are generally not orthogonal to the data, and for them, the RMSSS can be negative.Recall that zero skill corresponds to the unconditional forecast after removing low-frequency (anthropogenic) effects and the annual cycle.
If the process scales over an infinite range in the data but we only have access to the innovations over a duration λ mem (in pixels), then To illustrate the potentially huge amount of memory in the climate system (especially in the ocean), we can (somewhat arbitrarily) define the memory in the system by the λ mem value such that S k,λ mem ,∞ (1)/S k,∞,∞ (1) = 0.9; the result is shown in Fig. 1c.We see that, over land (using H = −0.3), the memory estimated this way typically only goes back 15 time units (nondimensional time steps), whereas over the ocean (using H = −0.1), it is 600 time units.This means that the annual temperatures over the ocean typically have information from over 600 years in the past, whereas over land, it is only 15 years.Note that these indicate the memory associated with 90 % of the skill (see Fig. 1a) and these skill levels fall off rapidly as H approaches the white noise value H = −1/2.We may also note that this calculation does not imply that if we only had a short length of ocean data, the forecast would be terrible.This is because the calculation is true for the innovations (γ 's) not the temperatures (T 's) themselves (i.e. the data).Even if we only have 10 years of ocean temperatures, the past from 10 years ago implicitly contains significant information from the distant past and can be exploited (see the numerical experiments in Hirchoren and Arantes (1998)).
In the real world, after the removal of the anthropogenic component (see Lovejoy and Schertzer (2013) and Fig. 4c), the scaling regime has a finite length (estimated as ≈ 100 years here), so that the memory in the process is finite.In addition, the monthly and annual resolution series that we hindcast below used memories of λ = 180 and 20 units (months and years) respectively.The finite memory is easy to take into account; if the process memory extends over an interval of λ mem units at resolution τ (i.e. over a time interval t = λ mem τ ), it suffices to integrate to λ mem instead of infinity, i.e. to replace infinity by λ mem in Eq. ( 49): In Fig. 1d we show that the effect of finite memory increases strongly as H moves closer to 0 and is nonnegligible, even for λ mem = 180, the largest used here (for the monthly series, when H = −0.17, the skill is reduced by 3-5 % up to λ = 60; see the bottom curves in Fig. 1d).
It is instructive to compare the skill obtained with the full memory with the skill obtained if only the most recent variable T τ (0) is used.The latter can be used either as classical persistence whereby the forecast at time t = λτ is equal to the present value (no change) (i.e.Tτ (t) = T τ (0)) or as "enhanced" persistence in which T τ (0) is used as a linear estimator of T τ (t).Since the mean of the process is 0, for a single time step t = τ in the future; this is the same as the minimum square forecast made of an order-1 autoregressive model with nondimensional time step = 1 (AR(1)).Note, however, that this equivalence is only for a single time step in the future; for forecasts further in the future, the AR(1) skill decays exponentially, not in a power law manner.
In persistence, Tτ (t) = T τ (0); the error in the forecast is simply the difference In enhanced persistence, the value T τ (0) is simply considered as an estimator and the minimum square error linear estimator T (t) is only proportional to T τ (0).A standard calculation (e.g.following Papoulis, 1965, ch. 13) yields ]T τ (0) so that the term in the square brackets enhances the persistence value T τ (0). Figure 1e compares the skill of the three estimators as functions of H for λ = 1 (i.e. using Eq. ( 25) for the autocorrelation): − 1)T τ (0).Whereas for H ≈ < −0.1, classical persistence is quite poor, we see that the enhanced persistence forecast is much better.

Forecasting the Northern Hemisphere and global temperatures
3.1 The data and the removal of anthropogenic effects In order to test the method, we chose the NASA GISS Northern Hemisphere and global temperature anomaly data sets, both at monthly and at annually averaged resolutions.A significant issue in the development of such global scale series is the treatment of the air temperature over the oceans, which is estimated from sea surface temperatures; NASA provides two sets, the Land-Ocean Temperature Index (LOTI) and Land-Surface Air Temperature Anomalies only (Meteorological Station Data; dT s series).According to the site (http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt),LOTI provides a more realistic representation of the global mean trends than dT s ; it slightly underestimates warming or cooling trends, since the much larger heat capacity of water compared to air causes a slower and diminished reaction to changes.dT s on the other hand overestimates trends, since it disregards most of the dampening effects of the oceans that cover about two thirds of the earth's surface.In order to compare the two, we used LOTI for the annual series and dT s for the monthly series.
The prediction formulae assume that the series has the power law dependencies indicated above, with RMS anomaly or Haar fluctuations following t H (Eq. 34) and spectra with ω −β , with β = (1 + 2H ) (Eq. 27).However, this scaling only holds over the macroweather regime, and in the industrial epoch, anthropogenic forcing begins to dominate the low-frequency variability on scales τ c ≈ 10-20 years, whereas it occurs on scales τ c ≈ 100 years in the pre-industrial epoch; see Lovejoy et al. (2013b) and Fig. 4d.However, Lovejoy (2014a, b) showed that if the radiative forcing due to the observed global annually averaged CO 2 concentrations (ρ CO 2 ) is used (proportional to log 2 ρ CO 2 ), the "effective climate sensitivity" λ 2×CO 2 ,eff is quite close to the more usual "transient" and "equilibrium" climate sensitivities estimated by GCMs and the residues had statistics over the scale range 1 to ≈ 125 years that were very close to preindustrial multiproxy statistics (see Table 1).
Therefore, as a first step, using the Frank et al. ( 2010) data (extended to 2013 as described in Lovejoy, 2014a), we removed the anthropogenic contribution, using where ρ CO 2 ,pre is the pre-industrial concentration.The monthly data are shown as a function of date (Fig. 3a) and of CO 2 forcing (Fig. 3b) with residues shown in Fig. 3c.The effective sensitivities are shown in Table 1a.We may note that if, alternatively, the equivalent CO 2 since 1880 was used ("CO 2 eq" as estimated in the IPCC AR5 report), the sensitivities need only be divided by a factor of 1.12, and the residues are essentially unchanged.This is because of the nearly linear relation between the actual CO 2 concentration and the estimated equivalent concentration (correlation coefficient > 0.993; see Table 3 for the standard deviations of the residues, T nat ).By using the observed CO 2 forcing as a linear surrogate for all anthropogenic effects, we avoid various uncertain radiative assumptions needed to estimate CO 2 eq especially those concerning the cooling effects of aerosols which are still unsettled.As explained in Lovejoy (2014b), since the anthropogenic effects are linked via global economic activity, the observed CO 2 forcing is a plausible linear surrogate for all of them.
From Table 2 we see that the sensitivities do not depend on the exact range over which they are estimated (columns 2-4).As we move to the present (column 4 to column 2), the sensitivities stay within the uncertainty range of the earlier estimates, with the uncertainties constantly diminishing, consistent with the convergence of the sensitivities as the record lengthens.As a consequence, if we determine T anth using the data only up to 1998 or up to 2013, there is very little difference: for the global data at a monthly resolution, the difference in the standard deviations (SDs) of T nat estimated with the different sensitivities is 0.005 K, whereas at annual resolutions, it is 0.0035 K (for this period, log 2 ρ CO 2 = 0.05).These differences are larger than the estimated error in the global-scale temperatures (estimated as ±0.03K for boththe errors have very little scale dependence; Lovejoy et al., 2013a).From Table 2, we see that there is a ≈ 30 % difference between the global and monthly sensitivities.Due to the change from the LOTI (global) to dT s (monthly) series the sensitivities are virtually independent of whether the data is at a 1-month or 1-year resolution.We also see that the Northern Hemisphere has systematically higher sensitivities than the entire globe; this is consistent with the larger land mass in the north and the larger sensitivity of land with respect to the ocean.
An obvious criticism of the method of effective climate sensitivities is that anthropogenic forcing primarily warms the oceans and, only with some lag, the atmosphere.Systematic cross-correlation analysis in Lovejoy (2014a, b) shows that while the residues are barely affected (see rows 2 and 3 in Table 2 and Lovejoy (2014b) for more on this), the values of the sensitivities are affected (see, e.g., column 4 in Table 2).We may note that using Eq. ( 52) (no lag), or the same relation but with a lag, are equivalent to assuming a linear climate with Green's function given by a Dirac delta function.This and more sophisticated (power law) Green's functions will be discussed in a future publication.
Finally, we can note that the difference between LOTI and dT s temperature is primarily the sensitivities (Table 2) and that the remaining differences in the residues are mostly due to their different resolutions.From Eq. (39) we see that the ratio of RMS fluctuations in these should be λ H , where λ is the resolution ratio, here 12 months yr −1 .Table 1 shows that the H estimated from the RMS values is indeed close to the H estimated more directly in the next subsection.This shows that the main difference between the LOTI and dT s series is indeed their climate sensitivities.
In order to judge how close the residues from the CO 2 forcing (Eq.51) are to the true natural variability, we can make various comparisons (Table 3).Starting at the top (row 1), we see that, as shown in Lovejoy (2014b), the statistics of the resulting residues are very close to those of preindustrial multiproxies (see also Fig. 4c).In row 3, we see that if we take the residues of the 20-year lagged temperatures, there is virtually no difference (although the sensitivities are significantly higher; see Table 2).As a further reference (row 4), we see that it is substantially smaller than the standard deviation of the linearly detrended series (i.e. when the residues are calculated from a linear regression with time rather than the forcing).
As further evidence that residues provide a good estimate of the true natural variability, in rows 5-10 we also show the annual RMS errors of various GCM global temperature hindcasts.For example, in rows 5-6 we compare hindcasts of CMIP 3 (Coupled Model Intercomparison Project, phase 3) GCMs, both with and without annual data initialization Table 1.A comparison of root mean square (RMS) variances (data residues) and hindcast errors (from deterministic and stochastic models) of global-scale, annual temperatures.See also Fig. 2. Note that the GCM hindcasts are all "optimistic" in the sense that they use the observed volcanic and solar forcings, and these would not be available for a true forecast.In comparison, the stochastic models forecast the responses to these (unknown) future forcings.a The average of the three multiproxies from Huang (2004), Moberg et al. (2005) and Ammann and Wahl (2007).These analyses were discussed in Lovejoy (2014b).b The empirical 5-year and 9-year anomaly values are close to the theoretical values 0.109 −0.2 = 0.079 and 0.109 × 9 −0.2 = 0.070.c The results here are for a subset of the CMIP5 simulations that were run with and without annual data assimilation (initialization).d Linear inverse modelling using 20 eigenmodes and > 100 parameters.The errors in brackets are for the temperatures; they are not anomalies.Note that the 9-year LIM value is almost identical to the standard deviation of the residues of the linear regression (fourth row of the table).e ARFIMA: autogressive fractionally integrated moving average process; this is close to the SLIMM used here.However, the data and the data treatment were somewhat different.The annually, globally averaged temperatures from 1880 with a linear trend removed were used to make hindcasts over horizons of 1 to 10 years for the decades 1930-1940, 1940-1950, 1950-1960, and 1960-1970.For each decade all the forecast errors were averaged.The value indicated here is the mean of the decade to decade mean error and the standard deviation of that error; the errors cannot therefore be directly compared with each other.The data were from a series compiled in 1986.f AR(1): autoregressive order 1; this is equivalent to enhanced persistence.The variance reduction when using ARFIMA instead of AR( 1) is 29 %.g The values in parentheses are for 1-year resolution temperatures.
Earth Syst.Dynam., 6, 637-658, 2015 www.earth-syst-dynam.net/6/637/2015/(rows 5 and 6).Without initialization (row 5), the results are half way between the CO 2 forcing residues (i.e.T nat , row 2) and the standard deviation of the linearly detrended series (row 4), i.e. the forecast is poor even for the anthropogenic part.Unsurprisingly, with annual data initialization and assimilation (row 6), it is much better, but it is apparently still unable to do better than simply estimate the anthropogenic component.We can deduce this since the resulting RMS errors are virtually identical to the standard deviation of the estimated T nat (row 3).This conclusion is reinforced in row 7, where CMIP 3 GCMs (without data initialization) were analysed.However, in place of annual data initialization, a complex empirical bias and variance correction scheme was implemented in order to keep the statistics of uninitialized hind-  spectra, at monthly (solid) and annual (dashed) resolutions using the NASA GISS surface temperature anomaly series from 1880-2013.For frequencies higher than the lowest factor of 10, averages have been made over 10 frequency bins per order of magnitude in scale.In addition, the spectra have been "compensated for" by multiplying by ω 0.54 so that spectra with H = −0.23 (β = 0.54) appear flat.The range −0.17 < H < −0.23, corresponding to one standard deviation limits (β = 1 + 2H , i.e. ignoring small multifractal intermittency corrections), corresponds to 0.54 < β < 0. casts close to the data.We see that the resulting RMS error is virtually identical to GCMs with data initialization (row 6) as well as the standard deviation of T nat (row 3).It is also very close to other GCM estimates of natural variability.These conclusions are reinforced in the 5-and 9-year "anomaly" columns.As expected -due to the averaging of the temperature in the definition of the anomalies (as far as the forecast horizon) -the RMS error decreases.However, it is still only barely better than the T nat estimates from the residues.Very similar results are indicated in rows 8-10 for other GCM hindcast experiments.These are shown graphically in Fig. 2, which is adapted from a multimodel ENSEMBLES experiment.The hindcasts are discussed in García-Serrano and Doblas-Reyes (2012).The multimodel mean is consistently close to -but generally a little above -T nat (bottom horizontal line), while remaining better than the standard deviation of the linearly detrended temperature (top horizontal line).Also shown in Table 1 and Fig. 2 are the results of LIM, SLIMM and other stochastic models.These will be discussed further in Sect. 4. For now, suffice it to indicate that the SLIMM error is bounded above by the standard deviation of T nat .By using the long-range memory to forecast T nat , SLIMM can only do better.SLIMM thus generally upon the GCMs, and -for 2-year horizons and beyond -it is better than the > 100 parameter LIM model, whose 9-year forecast is essentially equivalent to a linear detrending.

Estimating H from the residues
Having estimated T nat by removing the anthropogenic contribution, we may now test the quality of the scaling and estimate H . Figure 4a shows the raw spectra of the residues showing the scaling but with large fluctuations (as expected) with β ≈ 0.60.We have already mentioned that the intermittency is low in this macroweather regime.Indeed, using exponents estimated in Lovejoy and Schertzer (2013), the resulting multifractal corrections to the variance exponent are ≈ 0.01-0.02so that we may use the monofractal relation β = 1 + 2H , which yields H ≈ −0.20.Slightly more accurate estimates can be obtained by averaging the spectrum over logarithmically spaced bins (Fig. 4b), and by com-pensating for the spectrum by dividing it by the theoretical spectrum with β = 0.54 (H = −0.17).This figure makes the estimate β = 0.20 ± 0.06 (H = −0.20 ± 0.03) plausible.Finally, the corresponding RMS Haar fluctuations are shown in Fig. 4c.We see that they plausibly follow H = −0.20 out to about 100 years (the sharp drop at the largest lag is not significant: it corresponds to a single long fluctuation that is somewhat biased since some of the low-frequency natural variability is also removed when T nat is estimated by the method of residuals).
Also shown for reference in Fig. 4c is the GISS-E2-R millennium control run (with fixed forcings) as well as the RMS fluctuations for three pre-industrial multiproxies.We see that up to about 100-year scales, all the fluctuations have nearly the same amplitudes as functions of scale, giving support to the idea that T nat as estimated by residuals is indeed a good estimate of the natural variability and also confirming the estimate of the global-scale exponent value H = −0.20 ± 0.03.
As a final comparison, Fig. 4d shows RMS Haar fluctuations for the global averages (from Fig. 4c), land only averages and from the oceans -the Pacific decadal oscillation (PDO).The PDO is the amplitude of the largest eigenvalue of the Pacific sea surface temperature autocorrelation matrix (i.e. the amplitude of the most important empirical orthogonal function: EOF).For the land-only curve, notice the sharp rise for scales > ≈ 10 years; this is the effect of the anthropogenic signal that was not removed in this series.Overall we see that (roughly) for land H ≈ −0.3, for the globe H = −0.2, and for the oceans H = −0.1.Figure 1a and c shows the drastic differences in memory implied by these apparently small changes in H (we recently discovered that other global temperature series had global H closer to −0.1 so that the GISS series analysed here is the least predictable).

The numerical approach
The theory for predicting fGn leads to the general equation for the variance in forecast error (E T ) at forecast horizon t and resolution τ (Eq.( 45)).In order to test the equation on the temperature residues, we can use the global and Northern Hemisphere series analysed in the previous section and systematically make hindcasts.In this first study, we took a simple, straightforward approach based on the method of innovations.We discretized Eq. ( 9), which was then written as 652 S. Lovejoy et al.: SLIMM: using scaling to forecast global-scale macroweather from months to decades a matrix equation of the form T t = t <t M t,t γ t , where the indices refer to the discrete time nondimensionalized by the series resolution and M t,t is the (singular) kernel from the fractional integration.The sum was over a finite past of length t mem = λ mem τ units (see below), and the matrix was then inverted to yield the corresponding innovations γ t .To make the forecast at time t + t (i.e.t units in the future), the equation was used with an augmented kernel M t+ t,t , with the innovation vector lengthened by appending t zeroes (the expectation values of the unknown future innovations) to the t mem innovations that were determined in the previous step.
While our approach has the advantage of being straightforward (and it was tested on numerical simulations of fGn), in future applications improvements could be made.For example, by using a Girsanov formula, we could rewrite fGn in terms of a finite integral (see Biagini et al., 2008), and the discretized numerics would then be more accurate (this is especially important for H near the limiting values 0 and −1/2).Alternatively, we could use the Gripenberg and Norros (1996) integral equation method discretized with a variant of the Hirchoren and Arantes (1998) approach, which notably has the advantage of requiring less past data.

The hindcasts
In order to obtain good hindcast error statistics, it is important to make and validate as many hindcasts as possible, i.e. one for each discretized time that is available.However, due to the long-range correlations, we want to use a reasonable number of past time steps in the hindcast for memory, so that the earliest possible hindcast will be later than the earliest available data by the corresponding amount.The compromise used here consisted of dividing the 134-year series into 30 annual blocks (annual resolution) and 20-year blocks (monthly resolution).In each block in the annual series, the first 20 years were used as "memory" to develop the hindcast over the next 10 years so that for estimating the hindcast errors a total of 134 − 30 = 104 forecasts were made.For the monthly series, the same procedure involved blocks of 240 months: 180 months for the memory and 60 months for the hindcast for a total of 1608 − 240 = 1368 hindcasts.
The hindcasts can be evaluated at various resolutions and forecast horizons.Eqs. ( 46), ( 49) and ( 50) give the general theoretical results.The cases of special interest are the temperature hindcasts and the anomaly hindcasts with resolutions and horizons of (τ , λτ ) and (λτ , λτ ) respectively.The error variance ratios (R) are and = λ 2H ).The red lines are global series; the blue lines are Northern Hemisphere series.The thick, shorter curves are at annual resolution (τ = 1 year) and the thin, longer lines are at a monthly resolution (τ = 1 month).Also shown (dashed) are the theory curves for H = −0.17,−0.23 (top (black) and bottom (brown) of each dashed pair respectively).The data closely follow the H = −0.17curves.The standard deviations at the highest-resolution E T (τ, τ ) 2 1/2 are given in Table 4.This plot has no adjustable parameters.
Both ratios are shown in Fig. 5, along with the exact theory curves, and Table 3 gives the corresponding highestresolution standard deviations (for both lagged and unlagged estimates of T nat , there is virtually no difference), Table 4.It can be seen that all the forecast error variances (global, northern, annual, monthly resolution) collapse quite well between the theory curves corresponding to H = −0.17 and H = −0.23 (corresponding to H ≈ −0.20 ± 0.03) although they are closer to the H = −0.17curves.It is important to stress that Fig. 5 is completely nondimensional.It depends on a single parameter (H ), and this parameter was estimated earlier using a quite different technique (Haar fluctuations and spectra) that had no direct relation to the property being measured (forecast skill).We have effectively used spectral and Haar analysis of scaling to determine the accuracy of forecasts using no extra information.Figure 5 has no adjustable parameters so that the agreement of the hindcast errors with theory is a particularly strong confirmation of the theory.We could add that the fact that the errors depend only on the dimensionless forecast horizon is also a consequence of the scaling, i.e. of the lack of strong characteristic timescale in the macroweather regime.
Since the anomaly errors are power laws (Eq.54), they can be conveniently evaluated on a log-log plot (see Fig. 6).Note that the RMS anomaly errors decrease with forecast horizon.The reason is that while forecasts further and further in the future lose accuracy, this loss is more than compensated for by the decrease in the variance due to the lower resolution, so that the anomaly variance decreases.Finally, we may note that the method has been applied to explaining the "pause" or "hiatus" in the global warming since 1998 as well as to make a forecast to 2023 (Lovejoy, 2015b).

Hindcast skill
Another way to evaluate the hindcasts is to determine their nondimensional skills, i.e. the fraction of the variance that they explain (see the general formula Eq. 46).From the formula, we can see that the skill depends only on the nondimensional forecast horizon λ = t/τ .Therefore, the skill for forecast anomalies -i.e. the average of the forecast up to the horizon, i.e. t = τ and hence λ = 1 -has the remarkable property of being constant, independent of the horizon.The reason is that while forecasts further and further in the future lose accuracy, this loss is exactly compensated for by the decrease in the variance due to the lower resolution, so that the anomaly skill does not change.Figure 7 is another example of a nondimensional plot where the theory involves no adjustable parameters.It shows that the theoretical prediction is well respected by the global and Northern Hemisphere annual and monthly resolution series.Since we estimated H = −0.20 ± 0.03, it can be seen that the skill for the monthly series is nearly as high as theoretically predicted: up to 1 year or so for the global series but up to several years for the Northern Hemisphere series.The global series has slightly lower forecast skill than theoretically predicted, but it is still of the order of 15 % at 10 years.Also shown in Fig. 7 is the effect of using only a finite part of the memory.
The skill in usual temperature forecasts (i.e. with fixed resolution τ and increasing horizon t = λτ ) is shown in Fig. 8.We see that monthly series can be predicted to nearly the theoretical limit of up to about 2-3 years (≈ 5 % skill).For the annual series, this is up to about 5 years (≈ 10 % skill).Again the results are close to the H = −0.17theory.

Hindcast correlations
A final way to evaluate the hindcasts is to calculate the correlation coefficient between the hindcast and the temperature: Since < T > = 0, the cross term vanishes; using Eq. ( 44) we obtain the simple result .The anomaly forecast skill as a function of forecast horizon (horizontal axis) on a log-linear plot for both series (annual thin, monthly thick; global red, Northern Hemisphere blue).Also shown are pairs of theoretical predictions (constant skill independent of the forecast horizon) for various values of H .The top (dashed) member of the pair is for an infinite memory; the bottom solid line is for the finite memory used here: the monthly series has a memory of 180; the annual series has a memory of 20.This plot has no adjustable parameters.
i.e. ρ T ,T (t, τ ) = S k (t, τ ) 1/2 , a result which depends on the consequences of orthogonality: T τ (t) Tτ (t) = Tτ (t) 2 (Eq.Northern Hemisphere (blue) series.Also shown are the exact theoretical curves (for H = −0.17)that take into account the finite memories of the forecasts (20 and 15 years for the annual and monthly series respectively).The raw curves were shifted a little upward so that their long-time parts were close to the theory; this is equivalent to using the theory to improve the estimate of the ensemble-average skill from the single series that were available.

42). Asymptotically for
In the special cases of anomalies t = τ , λ = 1, and we obtain so that the correlations are constant at all forecast horizons.Over the range −1/2 < H < 0, the constant U is conveniently close to unity.As in the previous hindcast error analyses, the series were broken into blocks and the forecasts were repeated as often as possible; each forecast was correlated with the observed sequence and averages were performed over all the forecasts and verifying sequences (the mean correlation shown by the solid lines in Fig. 9).The uncertainty in the hindcast correlation coefficients was estimated by breaking the hindcasts into thirds: three equal sized groups of blocks with the error being given by the standard deviation of the three about the mean (dashed lines).Also shown in Fig. 9 are the theoretical curves (Eq.54) for H = −0.20.In this case the dashed lines indicate the theory for 1 standard deviation in H , i.e. for H = −0.17 and H = −0.23.
As predicted by Eq. ( 57), the anomaly correlations are relatively constant up to about 5 years for the annual data (top row) and nearly the same for the monthly data (bottom row).In addition, the Northern Hemisphere series (blue) are somewhat better forecast than the global series (red).It can be seen  that temperature forecasts (i.e. with fixed resolutions) have statistically significant correlations for up to 8-9 years for the annual forecasts, for up to about 2 years for the monthly global and nearly 5 years for the monthly Northern Hemisphere forecasts (bottom dashed lines).The anomaly forecasts are statistically significantly correlated at all forecast horizons.Figure 9 provides more examples of nondimensional plots with no free parameters, and again the agreement with the hindcasts validation is remarkable.
Although the results for the anomaly correlations are quite close to those of hindcasts in García-Serrano and Doblas-Reyes (2012), the latter are for the entire temperature forecast, not just the natural variability as here.This means that the GCM correlations will be augmented with respect to ours due to the existence of long-term anthropogenic trends in both the data and the forecasts that are absent in ours (but even with this advantage, their correlations are not higher).

Comparison with GCMs, LIM, AR(1) and ARFIMA hindcasts
In Table 1 and Fig. 2, we have already compared GCM hindcast errors with estimates of the natural variability (T nat ) from the residues of a linear regression on the CO 2 radiative forcing since 1880.We found that the annual, global GCM hindcasts had errors that were close to, but generally larger than, the standard deviation of T nat ( T 2 nat 1/2 ) but smaller than the standard deviation of the linearly detrended temperature series (the horizontal lines in Fig. 2).T 2 nat 1/2 is the RMS error of an unconditional forecast (i.e. with no knowledge of Earth Syst.Dynam., 6, 637-658, 2015 www.earth-syst-dynam.net/6/637/2015/ the past): T 2 nat,τ = E 2 T (τ, ∞) (see Eq. 45).It is the upper bound to the hindcast errors, a lower bound on skill (= 0).In Fig. 2, we see that the one-parameter stochastic hindcast (with H = −0.2) is somewhat better than the GCMs up to about 6 years, after which it is about the same.This bolsters the hypothesis that GCMs primarily model the anthropogenic temperature change, not the natural variability, whereas SLIMM has some skill in forecasting the latter.
Table 2 and Fig. 2 also compare SLIMM RMS errors to those of LIM hindcasts modelled with 20 degrees of freedom (involving > 100 parameters).We see that LIM is slightly better than SLIMM for horizons of up to about 2 years, beyond which SLIMM is better.According to the analysis in Newman (2013), for periods beyond about 1 year, the forecasts are mostly determined by the two most important EOFs, and their skill decays exponentially, not as a power law.From Fig. 2, their main effect seems to be to remove the long-term linear trend allowing LIM to have an asymptotic RMS error roughly equal to the standard deviation of the linearly detrended series (the upper horizontal line).
Finally, in Table 1, rows 12 and 13, we have compared the errors with those of an early attempt at scaling temperature forecasts using the autoregressive fractionally integrated moving average process (ARFIMA) (Baillie and Chung, 2002b) along with the corresponding order-1 autoregressive (AR(1)) process.Unfortunately, the forecasts were made by taking 10-year segments and, in each, removing a separate linear trend so that the low frequencies were not well accounted for (see the footnote to the table for more details).The AR(1) results were not so good as they were close to the standard deviations of the detrended temperatures.As expected -because they assume a basic scaling frameworkthe ARFIMA results were somewhat better.Yet they are substantially worse than those of the other methods, probably because they did not remove the anthropogenic component first.

Conclusions
GCMs are basically weather models whose forecast horizons are well beyond the deterministic predictability limits, corresponding to many lifetimes of planetary-scale structures: the macroweather regime.In this regime -that extends from about 10 days to ≈ 100 years (pre-industrial), the weather patterns that are generated are essentially random noise.With fixed boundary conditions, GCMs therefore converge asymptotically (in a power law manner; Fig. 4c) to their (model) climates.In order to model the low-frequency variations associated with the climate proper, the GCMs must be externally forced; if the forcing is strong enough, in principle it can reverse the trend of macroweather fluctuations decreasing with increasing timescale and initiate a new climate regime where fluctuations instead increase with scale (qualitatively similar to their behaviour in the higher-frequency weather regime; see Lovejoy et al., 2013b).In the real world (pre-industrial), this occurs somewhere around 100 years and fluctuations increase in scaling manner (but now with H > 0) as far as iceage timescales (≈ 50-100 kyr; note that this 100-year preindustrial transition scale apparently has large geographical variability; see Lovejoy and Schertzer (2013), Sect. 11.1.4).On these scales, in addition to solar and volcanic forcings, the real world may involve new, slow internal processes that become important.
In this regard, the problem with the GCM approach is that in spite of massive improvements over the last 40 years, the weather noise that they generate is not totally realistic nor does their climate coincide exactly with the real climate.In an effort to overcome these limitations, stochastic models have been developed that directly and more realistically model the noise and use real-world data to exploit the system's memory so as to force the forecasts to be more realistic.
The main approaches that could potentially overcome the GCM limitations are the stochastic ones.However, going back to Hasselmann (1976), these have only used integerordered differential equations.They have implicitly assumed that the low frequencies are white noises and hence cannot be forecast with any skill.Modern versions -the LIM -add sophistication and a large number of (usually, but not necessarily) spatial parameters, but they still impose a short (exponentially correlated) memory and they focus on periods of up to a few years at most.This contrasts with turbulencebased nonlinear stochastic models which assume that the system scales over wide ranges.When they are extended to the macroweather regime (the Extended Fractionally Integrated Flux -EFIF model), these scaling models have low intermittency, scaling fluctuations with temporal exponents close to those that are observed by a growing macroweather scaling literature.Contrary to their behaviour in the weather regime, in macroweather they are only weakly nonlinear.However, empirically, the spatial macroweather variability is very high so that Lovejoy and Schertzer (2013) already proposed that the EFIF model be spatially modulated by a multifractal climate process (yielding the CEFIF) whose temporal variability was at such low frequencies so as to be essentially constant in time over the macroweather regime.
The CEFIF model is complex both numerically and mathematically and its prediction properties are not known.In this paper, we therefore make a simplified model, the ScaLIng Macroweather Model (SLIMM) that can be strongly variable (intermittent) in space and Gaussian (nonintermittent) in time (see Lovejoy and de Lima (2015) for this regional SLIMM).The simplest relevant model of the temporal behaviour is thus fractional Gaussian noise (fGn), whose integral is the better-known fractional Brownian motion (fBm) process.A somewhat different way of introducing the spatial variability is to follow the LIM approach and treat each (spatial) grid point as a component of a system vector.In this case, SLIMM can be obtained as a solution of a fractionalwww.earth-syst-dynam.net/6/637/2015/Earth Syst.Dynam., 6, 637-658, 2015 order generalization of the usual LIM differential equations.
Although in future publications we will show how to make regional SLIMM forecasts, in this paper, we only discuss the scalar version for single time series (here, global-scale temperatures).In Sect.2, we situate the process in the mathematical literature and derive basic results for forecasts and forecast skill.These results show that a remarkably high level of skill is available in the climate system; for example, for forecast horizons of one nondimensional time unit in the future (i.e.horizons equal to the resolution), the forecast skillsdefined as the fraction of the variance explained by the forecast -are 15, 35 and 64 % for land, the whole globe and oceans respectively (Fig. 1b; taking rough exponent values H = −0.3,−0.2, −0.1 respectively; Fig. 4c).To quantify the size of the memory, it can be defined as the number of nondimensional units needed to supply 90 % of the full memory of the system.Using the same empirical exponents, we found that the memory is 15, 50 and 600 time units for typical land, the globe and typical ocean regions respectively.
The SLIMM forecasts the natural variability.While the responses to solar and volcanic forcings are implicitly included in the forecast, the responses to the anthropogenic forcings are not; we must therefore remove the anthropogenic component, which becomes dominant on scales of 10-30 years.For this, we follow Lovejoy (2014b), who showed that the CO 2 radiative forcing is a good linear proxy for all the anthropogenic effects (including the cooling due to aerosols, which is difficult to estimate) so that the natural variability is the residue with respect to a regression against the forcing.In Table 1 and in Fig. 2, we showed that the resulting standard deviation (±0.109K) is very close to the RMS errors in annual, globally averaged GCM temperature hindcasts that use annual data initialization and assimilation.Indeed, to a good approximation, all the models have errors bounded between this estimate of the natural variability and the slightly higher standard deviation of the linearly detrended temperature series (±0.163K).This is true in spite of the fact that they are "optimistic" since they assume that the future volcanic and solar forcings are known in advance.The only partial exception is the stochastic LIM model (with > 100 parameters), which is only marginally better (±0.085K) than SLIMM for forecast horizons of 1-2 years, after which it asymptotes to the linearly detrended standard deviation.
Using the method of innovations, we developed a new way of forecasting fGn that allows SLIMM hindcasts to be made; the long-time forecast horizon RMS error is thus ±0.109 K; the exploitation of the memory with the single parameter -the exponent H ≈ − 0.20 ± 0.03 -reduces this to ≈ ±0.093K for 1-year global hindcasts so that SLIMM remains better than or comparable to the multimodel GCM mean (Fig. 2).
This paper only deals with single time series (global-scale temperatures), but it is nevertheless ideal for revisiting the problem of the pause, "slow down" or hiatus in the warm-ing since 1998, which is a global-scale phenomenon.Lovejoy (2015b) shows how SLIMM hindcasts nearly perfectly predict this hiatus.However, most applications involve predicting the natural variability on regional scales.A future publication will show how this can be done and will quantify the improvement that the additional information (from the regional memory) makes to the forecasts.As forecasts from months to a decade or so, the SLIMM forecast are potentially better than alternatives.

Figure 1 .
Figure 1.(a) Forecast skill for nondimensional forecast horizons λ = t/τ = 1, 2, 4, 8, . . .64 (left to right) as functions of H .For reference, the rough empirical values for land, ocean and the entire globe (the value used here; see below) are indicated by dashed vertical lines.The horizontal lines show the fraction of the variance explained (the skill, S k ; Eq. 46) in the case of a forecast of resolution τ data at a forecast horizon t = τ (λ = 1; corresponding to forecasting the anomaly fluctuation one time unit ahead).(b) The theoretical skill with infinite memory for various ratios of nondimensional forecast horizons λ over the range 0 ≥ H ≥ −0.35 (top to bottom in steps of 0.05).The limiting value H = −1/2 corresponds to Gaussian white noise with zero skill.The empirically relevant range for the whole earth (H ≈ −0.20 ± 0.03) is in red; the thick line represents the best estimated parameter (H = −0.20).(c) This illustrates the potentially huge memory in the climate system (especially the ocean).It gives the λ mem value such that S k,λ mem (1)/S k,∞(1) = 0.9.When H = −1/2, there is no memory and λ mem is not defined; it diverges when H → 0. (d) The theoretical skills for hindcasts with infinite (Eq.46) and finite memory (Eq.49) for the empirically relevant parameter range (H = −0.23,brown, bottom; H = −0.17,red, top).The flat (constant skill) at the top are the curves for the anomaly forecasts (i.e. with forecast horizon, t is equal to the resolution τ so that λ = 1); the bottom curves are for constant resolution τ with forecast horizon.In each case a triplet of curves is shown corresponding to varying lengths of memories used in the forecast: infinite, 180 and 20 (the latter two corresponding to those used for the monthly and global forecasts analysed here).(e) The skill of λ = t/τ = 1 forecasts using the full memory (black, Eq. 46, from a), the corresponding classical persistence forecast (red), S k = 1-4 (1-2 2H ) and the corresponding "enhanced persistence" result (blue; for this λ = 1 case, this is the same as the AR(1) (autoregressive order 1) model forecast) with S k = (2 2H +1 − 1) 2 .With classical persistence the skill becomes negative for H < ≈ −0.2, so it is not shown over the whole range.

Figure 2 .
Figure 2. ENSEMBLES experiment, LIM and SLIMM hindcasts for global annual temperatures for horizons of 1 to 9 years.The light lines are from individual members of the ENSEMBLE experiment; the heavy line is the multimodel ensemble adapted from Fig. 4 in García-Serrano and Doblas-Reyes (2012).This shows the RMSE comparisons for the global mean surface temperatures compared to NCEP/NCAR (2 m air temperatures).Horizontal reference lines indicate the standard deviations of T nat (bottom panel) and of the linearly detrended temperatures (top panel).Also shown are the RMSE for the LIM model (fromTable 1 in Newman, 2013) and the SLIMM.

Figure 3 .
Figure 3. (a) Upper left: the monthly surface temperature anomaly series from NASA GISS data (the monthly dT s series).(b) Top (red) is the global average, displaced upward by 2 K for clarity; the bottom (blue) is the Northern Hemisphere series displaced upward by 1 K. Upper right: the same as the upper left but for the temperatures as functions of the logarithm of the CO 2 concentration ρ CO 2 normalized by the pre-industrial value ρ CO 2 ,pre = 277 ppm (global values are displaced upward by 2 K and Northern Hemisphere values are displaced by 1 K for clarity).The regressions have slopes indicated in Table 2; they are the effective climate sensitivities to CO 2 doubling.(c) Lower left: the residues of the linear regressions of the upper right; the estimate of the natural variability (global -red, top; Northern Hemisphere -blue, bottom) has been shifted upward by 1 K for clarity.

Figure 4 .
Figure 4. Upper left: the spectrum of the monthly residues for northern (blue) and global (red) data.The slope β = 0.6 is shown corresponding to the best overall estimate (H = −0.20).(b) Upper right: the Northern Hemisphere (top, blue) and global (bottom, red) spectra, at monthly (solid) and annual (dashed) resolutions using the NASA GISS surface temperature anomaly series from 1880-2013.For frequencies higher than the lowest factor of 10, averages have been made over 10 frequency bins per order of magnitude in scale.In addition, the spectra have been "compensated for" by multiplying by ω 0.54 so that spectra with H = −0.23 (β = 0.54) appear flat.The range 66.The lower and upper bounding reference lines are shown dashed.(c) Lower left: the RMS Haar fluctuations for the northern (blue) and global (red) monthly series.Reference lines with slopes H = −0.2 are shown.We see that the scaling is fairly well respected up to ≈ 100 years.The raw Haar fluctuations have been multiplied by 2 (the "canonical calibration"; see Lovejoy and Schertzer, 2012a) in order to bring them closer to the anomaly fluctuations.Also shown is the NASA control run and the pre-industrial multiproxies.They all agree quantitatively very well up to about 100 years, where the pre-industrial natural climate change starts to become important.This shows that the monthly-scale residuals are almost exactly as simulated by the GISS model without any anthropogenic effects, supporting the idea that T nat is a good estimate of the natural variability.(d) Lower right: comparisons of the RMS Haar fluctuations of global-scale natural variability (T nat ) from the lower left with those from land only (HADCRUT3, black) and from the Pacific decadal oscillation (PDO, top, purple; from Lovejoy and Schertzer (2013), Fig. 10.14).Reference lines of slopes H = −0.1,−0.2, −0.3 are shown close to the curves for ocean, globe and land respectively.

Figure 6 .
Figure 6.A log-log plot of the standard deviations of the anomaly hindcasts, with the theoretical reference line corresponding to H = −0.20.The solid lines are for the monthly data, the dashed lines for annual data.Red indicates global data; blue indicates Northern Hemisphere data.

Figure 9 .
Figure9.The empirical correlations of the forecast temperatures (left column) and anomalies (right column).The same hindcasts but with different empirical comparisons and also with comparisons with theory for H = −0.2(thick black line), H = −0.17(top dashed black line), and −0.23 (bottom dashed black line).Now note that in all cases the one standard deviation bounds (dashed) on the empirical and theoretical curves overlap virtually throughout.The theory curves have no adjustable parameters.

Table 2 .
The climate sensitivities estimated by linear regression of log 2 ρ CO 2 against the temperature anomalies at monthly and annual resolutions from global and Northern Hemisphere series.The far right column shows the 20-year lagged sensitivity over 1900-2013, i.e. using T anth t (t) = λ 2×CO 2 ,eff, t log 2 (ρ CO 2 (t − t)/ρ CO 2 ,pre ), where t = 20 years.

Table 3 .
The various standard deviations of the temperature residues (T nat ) after removing T anth at monthly and annual resolution and the estimate of H obtained assuming perfect scaling over a factor of 12 in timescale (units: K).

Table 4 .
The hindcast standard deviations (in K) at the finest resolutions (1 month, 1 year) for natural variability temperatures obtained from the unlagged and 20-year lagged climate sensitivities.Note that the lag makes very little difference to the hindcast error variance.
).The same hindcasts but with different empirical comparisons and also with comparisons with theory for H = −0.2(thick black line), H = −0.17(top dashed black line), and −0.23 (bottom dashed black line).Now note that in all cases the one standard deviation bounds (dashed) on the empirical and theoretical curves overlap virtually throughout.The theory curves have no adjustable parameters.