At scales much longer than the deterministic predictability limits (about 10 days), the statistics of the atmosphere undergoes a drastic transition, the high-frequency weather acts as a random forcing on the lower-frequency macroweather. In addition, up to decadal and centennial scales the equivalent radiative forcings of solar, volcanic and anthropogenic perturbations are small compared to the mean incoming solar flux. This justifies the common practice of reducing forcings to radiative equivalents (which are assumed to combine linearly), as well as the development of linear stochastic models, including for forecasting at monthly to decadal scales.

In order to clarify the validity of the linearity assumption and determine
its scale range, we use last millennium simulations, with both the
simplified Zebiak–Cane (ZC) model and the NASA GISS E2-R fully coupled GCM.
We systematically compare the statistical properties of solar-only, volcanic-only and combined solar and volcanic forcings over the range of timescales
from 1 to 1000 years. We also compare the statistics to multiproxy
temperature reconstructions. The main findings are (a) that the variability
in the ZC and GCM models is too weak at centennial and longer scales;
(b) for longer than

The general circulation model (GCM) approach to climate modelling is based on the idea that whereas weather is an initial value problem, the climate is a boundary value problem (Bryson, 1997; Pielke, 1998). This means that although the weather's sensitive dependence on initial conditions (chaos, the “butterfly effect”) leads to a loss of predictability at timescales of about 10 days, averaging over enough “weather” nevertheless leads to a convergence to the model's “climate”. This climate is thus the state to which averages of model outputs converge for fixed atmospheric compositions and boundary conditions (i.e. control runs).

The question then arises as to the response of the system to small changes
in the boundary conditions: for example, anthropogenic forcings are less than
2 W m

It is therefore important to establish the timescales over which linear responses are a reasonable assumption. However, clearly even over scales where typical responses to small forcings are relatively linear, the response may be nonlinear if the forcing is volcanic or volcanic-like, i.e. if it is sufficiently “spikey” or intermittent.

Before turning our attention to models, what can we learn empirically?
Certainly, at high enough frequencies (the weather regime), the atmosphere
is highly nonlinear. However, at about 10 days, the atmosphere undergoes a
drastic transition to a lower-frequency regime, and this “macroweather”
regime is potentially quasi-linear in its responses. Indeed, the basic
atmospheric scaling regimes were identified some time ago – primarily using
spectral analysis (Lovejoy and Schertzer, 1986; Pelletier, 1998;
Shackleton and Imbrie, 1990; Huybers and Curry, 2006). However, the use
of real space fluctuations provided a clearer picture and a simpler
interpretation. It also showed that the usual view of atmospheric
variability, as a sequence of narrow-scale range processes (e.g. nonlinear
oscillators), has seriously neglected the main source of variability, namely
the scaling “background spectrum” (Lovejoy, 2014b). What was found is
that, for virtually all atmospheric fields, there was a transition from the
behaviour of the mean temperature fluctuations scaling

The explanation for the “macroweather” to climate transition (at scale

There have been several studies on the low-frequency control run responses of GCMs (Vyushin et al., 2004; Zhu et al., 2006; Fraedrich et al., 2009; Lovejoy et al., 2013; Fredriksen and Rypdal, 2016); the responses were found to be scaling down to their lowest frequencies. This scaling is a consequence of the absence of a characteristic timescale for the long-time model convergence; it turns out that the relevant scaling exponents are very small: empirically the GCM convergence is “ultra-slow” (Lovejoy et al., 2013) (Sect. 3.4). Most earlier studies have focused on the implications of the long-range statistical dependencies implicit in the scaling statistics. Unfortunately, due to this rather technical focus, the broader implications of the scaling have not been widely appreciated.

More recently, using scaling fluctuation analysis, behaviour has been put
into the general theoretical framework of GCM climate modelling
(Lovejoy et al., 2013). From the scaling point of view, it
appears that the climate arises as a consequence of slow internal climate
processes combined with external forcings (especially volcanic and solar, as well as – in the recent period – anthropogenic forcings). From the point of view
of the GCMs, the low-frequency (multicentennial) variability arises
exclusively as a response to external forcings, although potentially – with
the addition of (known or currently unknown) slow processes such as land–ice
or biogeochemical processes – new internal sources of low-frequency
variability could be included. Ignoring the recent (industrial) period, and
confining ourselves to the last millennium, the key question for GCMs
is whether or not they can reproduce the climate regime where the decline of
the “macroweather” fluctuations (

The weakness of the responses to solar and volcanic forcings at multicentennial scales raises a question of linearity: is the response of the combined (solar plus volcanic) forcing roughly the sum of the individual responses? Additivity is often implicitly assumed when climate forcings are reduced to their equivalent radiative forcings, and Mann et al. (2005) have already pointed out that, in the Zebiak–Cane (ZC) model discussed below, they are not additive. Here we more precisely analyze this question and quantify the degree of sub-additivity as a function of temporal scale (Sect. 3.4). A related linear/nonlinear issue pointed out by Clement et al. (1996) is that, due to the nonlinear model response, there is a high sensitivity to a small forcing and a low sensitivity to a large forcing. Systems in which strong and weak events have different statistical behaviours display stronger or weaker “clustering” and are often termed “intermittent” (from turbulence). When they are also scaling, the weak and strong events are characterized by different scaling exponents that quantify how the respective clustering changes with timescale. In Sect. 4, we investigate this quantitatively and confirm that it is particularly strong for volcanic forcing, and that for the ZC model the response (including that of a GCM) is much less intermittent, implying that the model strongly (and nonlinearly) smooths the forcing.

In this paper, we establish analysis methodologies that can address these issues and apply them to model outputs that cover the required range of timescales: last millennium model outputs. Unfortunately – although we consider the NASA GISS E2-R last millennium simulations – there seem to be no full last millennium GCM simulations that have the entire suite of volcanic-only, solar-only and solar plus volcanic forcings and responses; therefore we have used the simplified ZC model outputs published by Mann et al. (2005) (and even this lacked control runs to directly quantify the internal variability).

Although the ZC model lacks several important mechanisms – notably, for our purposes, deep ocean dynamics – there are clearly sources of low-frequency variability present in the model. For example, Goswami and Shukla (1991), using 360-year control runs, found multidecadal and multicentennial nonlinear variability due to the feedbacks between Sea Surface Temperature (SST) anomalies, low-level convergence and atmospheric heating. In addition, in justifying their millennium ZC simulations, Mann et al. (2005) specifically cited model centennial-scale variability as a factor motivating their study.

During the pre-industrial part of the last millennium, the atmospheric composition was roughly constant, and the Earth's orbital parameters varied by only a small amount. The main forcings used in GCM climate models over this period are thus solar and volcanic (in the GISS-E2-R simulations discussed below, reconstructed land use changes are also simulated but the corresponding forcings are comparatively weak and will not be discussed further). In particular, the importance of volcanic forcings was demonstrated by Minnis et al. (1993), who investigated the volcanic radiative forcing caused by the 1991 eruption of Mount Pinatubo, and found that volcanic aerosols produced a strong cooling effect. Later, Shindell et al. (2003) used a stratosphere-resolving GCM to examine the effect of the volcanic aerosols and solar irradiance variability on pre-industrial climate change. They found that the best agreement with historical and proxy data was obtained using both forcings. However, solar and volcanic forcings induce different responses because the stratospheric and surface influences in the solar case reinforce one another, but in the volcanic case they are opposed. In addition, there are important differences in solar and volcanic temporal variabilities (including seasonality) that statistically link volcanic eruptions with the onset of El Niño–Southern Oscillation events (Mann et al., 2005). Decreased solar irradiance cools the surface and stratosphere (Cracknell and Varotsos, 2007, 2011; Kondratyev and Varotsos, 1995a, b). In contrast, volcanic eruptions cool the surface, but aerosol heating warms the sunlit lower stratosphere (Shindell et al., 2003; Miller et al., 2012). This leads to an increased meridional gradient in the lower stratosphere but a reduced gradient in the tropopause region (Chandra et al., 1996; Varotsos et al., 1994, 2009).

Vyushin et al. (2004) suggested that volcanic forcings improve the low-frequency variability scaling performance of atmosphere–ocean models compared to all other forcings (see, however, the comment by Blender and Fraedrich, 2004, which also discusses earlier papers on the field) and Blender and Fraedrich (2004). Weber (2005) used a set of simulations with a climate model, driven by reconstructed forcings, in order to study the Northern Hemisphere temperature response to volcanic and solar forcing during 1000–1850. It was concluded that the response to solar forcing equilibrates at interdecadal timescales, while the response to volcanic forcing never equilibrates due to the fact that the time interval between volcanic eruptions is typically shorter than the dissipation timescale of the climate system (in fact, they are scaling, so that eruptions occur over all observed timescales; see below).

At the same time, Mann et al. (2005) investigated the response of El Niño to natural radiative forcing changes during 1000–1999 by employing the ZC model for the coupled ocean–atmosphere system in the tropical Pacific. They found that the composite feedback of the volcanic and solar radiative forcing to past changes, reproduces the fluctuations in the variability in the historic El Niño records (e.g. Efstathiou et al., 2011; Varotsos, 2013; Varotsos et al., 2015a, b).

Finally, as discussed below, Lovejoy and Schertzer (2012a)
analyzed the timescale dependence of several solar reconstructions
(Lean, 2000; Wang et al., 2005; Krivova et al., 2007; Steinhilber et al., 2009; Shapiro et
al., 2011) and the two main volcanic reconstructions (Crowley, 2000, and Gao et al., 2008; referred to as “Crowley” and “Gao” in
the following). The solar forcings were found to be qualitatively quite
different depending on whether the reconstructions were based on sunspots or

Mann et al. (2005) used the ZC model of the tropical
Pacific coupled ocean–atmosphere system (Zebiak and
Cane, 1987) to produce a 100-realization ensemble for solar forcing only,
volcanic forcing only and combined forcings over the last millennium. Figure 1a
shows the forcings and mean responses of the model which were obtained
from

The ultimate goal of weather and climate modelling (including forecasting)
is to make simulations

Starting in the 1990s, with the advent of ensemble forecasting systems, the
rank histogram (RH) method was proposed (Anderson, 1996) as a simple
non-parametric test of

A straightforward solution is to use the same basic idea – i.e. to change
the sense of equality from deterministic to probabilistic (“

In order to isolate the variability as a function of timescale

Fluctuations defined as differences are adequate for fluctuations increasing
with scale (

The Haar fluctuation, which is useful for

Once estimated, the variation in the fluctuations with timescale can be
quantified by using the fluctuations' statistics; the

Figure 2a shows the result of estimating the Haar fluctuations for the solar
and volcanic forcings. The solar reconstruction that was used is a hybrid
obtained by “splicing” the annual resolution sunspot-based reconstruction
(Fig. 2b, top; back to 1610, although only the more recent part was used by
Mann et al., 2005) with a

The reference lines in Fig. 2a have slopes

There is no question that – at least in the usual deterministic sense – the
atmosphere is turbulent and nonlinear. Indeed, the ratio of the nonlinear to
the linear terms in the dynamical equations – the Reynolds number – is
typically about 10

However, ever since Hasselmann (1976), it has been proposed that
sufficiently space–time-averaged variables may respond linearly to
sufficiently space–time-averaged forcings. In the resulting (low-frequency)
phenomenological models, the nonlinear deterministic (high-frequency)
dynamics act as a source of random perturbations; the resulting stochastic
model is usually taken as being linear. Such models are only justified if
there is a physical-scale separation between the high- and low-frequency processes. The existence of a relevant break (at 2–10-day scales)
has been known since Panofsky and Van der Hoven (1955) and was
variously theorized as the “scale of migratory pressure systems of synoptic
weather map scale” (Van der Hoven, 1957) and later as the “synoptic
maximum” (Kolesnikov and Monin, 1965). From the point of view of
Hasselman-type linear stochastic modelling (now often referred to as
“linear inverse modelling” (LIM), e.g. Penland and Sardeshmuhk,
1995; Newman et al., 2003; Sardeshmukh and Sura, 2009),
the system is regarded as a multivariate Ornstein–Uhlenbeck (OU)
process. At high frequencies, an OU process is essentially the integral of a
white noise (with spectrum

In the more general scaling picture going back to Lovejoy and Schertzer (1986),
the transition corresponds to the lifetime of planetary structures.
This interpretation was quantitatively justified in Lovejoy and
Schertzer (2010) by using the turbulent energy rate density. The low- and
high-frequency regimes were scaling and had spectra significantly different
than those of OU processes (notably with
0.2

These linear stochastic models (whether LIM or SLIMM) explicitly exploit the weather/macroweather transition and may have some skill up to macroweather scales perhaps as large as decades. However, at long enough timescales, another class of phenomenological model is often used, wherein the dynamics are determined by radiative energy balances. Energy balance models focus on slower (true) climate scale processes such as sea ice–albedo feedbacks and are generally quite nonlinear, being associated with nonlinear features such as tipping points and bifurcations (Budyko, 1969). These models are typically zero- or one-dimensional in space (i.e. they are averaged over the whole Earth or over latitude bands) and may be deterministic or stochastic (see Nicolis, 1988, for an early comparison of the two approaches). See Dijkstra (2013) for a survey of the classical deterministic dynamical systems approach as well as the more recent stochastic “random dynamical systems” approach (see also Ragone et al., 2014). Although energy balance models are almost always nonlinear, there have been several suggestions that linear energy balance models are in fact valid up to millennial and even multimillennial scales.

Finally, we could mention the existence of empirical evidence of stochastic
linearity between forcings and responses in the macroweather regime. Such
evidence comes, for example, from the apparent ability of linear regressions
to “remove” the effects of volcanic, solar and anthropogenic forcings
(Lean and Rind, 2008). This has perhaps been quantitatively
demonstrated in the case of anthropogenic forcing, where use is made of the
globally, annually averaged CO

We can now test the linearity of the model responses to solar and volcanic
forcings. First, consider the model responses (Fig. 3a). Compare the response
to the volcanic-only forcing (green) curve with the response from the solar-only forcing (black). As expected from Fig. 2a, the former is stronger than
the latter up until centennial scales, reflecting the stronger volcanic
forcing. At scales

In order to quantify this we can easily determine the expected solar and volcanic response if the two were combined additively (linearly). In the latter case, the solar and volcanic fluctuations would not interfere with each other, and since these forcings are statistically independent, the responses would also be statistically independent, the response variances would add.

A linear response means that temperature fluctuations due to only solar
forcing (

The calculations above ignored the model's internal variability; this was
considered small due to the averaging over 100 realizations of the ZC model
with the same forcings: the internal variability is expected to largely cancel out.
While it is true that a definitive answer to this requires running the model
in “control mode” so as to capture only the internal variability (as was
done in for the GISS model; see Fig. 4), there are nevertheless several
reasons why the internal variability is almost certainly smaller than the
response due to the forcings:

We can get a typical order of magnitude of the internal variability from the
GISS model, Fig. 4; we see that for a single realization – without averaging
over 100 realizations as in Fig. 3a – the typical centennial
variability is

We can use the fact that (a) the observed responses are upper bounds on the
internal variability and (b) that the internal variability must decrease with
scale (otherwise the model's climate diverges rather than converges for long
times). Exponents near the GISS value

A comparison of the Zebiak–Cane (ZC) model combined (volcanic and solar forcing) response (bold, brown) with GISS-E2-R simulations with solar-only forcing (red) and a control run (no forcings, black), the GISS structure functions are for land, Northern Hemisphere, reproduced from Lovejoy et al. (2013).

In the ZC model, all forcings are input at the surface so that here the subadditivity is due to the differing seasonality, fluctuation intensities and spatial distributions of the solar and volcanic forcings. In the GISS-E2-R GCM simulations, the response to the solar forcing is too small to allow us to determine whether it involves a similar solar–volcanic negative feedback (Fig. 4). In vertically stratified atmospheres, i.e. in GCMs or in the real atmosphere, non-additivity is perhaps not surprising given the difference between the solar and volcanic vertical heating profiles. If such negative feedbacks are substantiated in further simulations, it would enhance the credibility of the idea that current GCMs are missing critical slow (multi-centennial, multi-millennial) climate processes. No matter what the exact explanation, non-additivity underlines the limitations of the convenient reduction of climate forcings to radiative forcing equivalents. It also indicates that, at scales longer than about 50–200 years, energy budget models must nonlinearly account for albedo–temperature interactions (i.e. that linear energy budget models are inadequate at these timescales, and that albedo–temperature interactions must at least be correctly parametrized).

Also shown for reference in Fig. 3a are the fluctuations for three multiproxy estimates of annual Northern Hemisphere temperatures (1500–1900, pre-industrial; Moberg et al., 2005; Huang, 2004; Ljungqvist, 2010; analysis taken from Lovejoy and Schertzer, 2012c). Although it should be borne in mind that the ZC model region (the Pacific) does not coincide with the proxy region (the Northern Hemisphere), the latter is the best model validation available. In addition, since we compare model and proxy fluctuation statistics as functions of timescale, the fact that the spatial regions are somewhat different is less important than if we had attempted a direct year-by-year comparison of model outputs with the multiproxy reconstructions.

In Fig. 3a, we see that the responses of the volcanic-only and the combined
volcanic and solar forcings fairly well reproduce the RMS multiproxy
statistics until

In Fig. 4, we compare the RMS Haar fluctuations from the ZC model combined
(volcanic and solar forcing) response with those from simulations from the
GISS-E2-R GCM with solar-only forcing and a control run (no forcings, black;
see Lovejoy et al., 2013, for details; the GISS-E2-R solar
forcing was the same as the spliced series used in the ZC simulations). We
see that the three are remarkably close over the entire range; for the GISS
model, this indicates that the solar-only forcing is so small that the
response is nearly the same as for the unforced (control) run. The ZC
combined solar and volcanic forcing is clearly much weaker than the
pre-industrial multiproxies (dashed blue, same as in Fig. 3a). The reference
line with slope

Finally, in Fig. 5, we compare the responses to the volcanic forcings for
the ZC model and for the GISS-E2-R GCM for two different volcanic
reconstructions (Gao et al., 2008; Crowley, 2000; the latter reconstruction was used in the ZC simulation. For reference, we again show
the combined ZC response and the pre-industrial multiproxies. We see that the
GISS GCM is much more sensitive to the volcanic forcing than the ZC
model; indeed, it is too sensitive at scales

Note that for the spatial regions covered by the ZC simulation, the GISS outputs and the multiproxy reconstructions are not the same. For the latter, the reason is that there is no perfectly appropriate (regionally defined) multiproxy series, whereas for the GISS outputs we reproduced the structure function analysis from a published source. Yet, the differences in the regions may not be so important since we are only making statistical comparisons. This is especially true since all the series are for planetary-scale temperatures (even if they are not identical global-sized regions) and, in addition, we are mostly interested in the 50-year (and longer) statistics, and these may be quite similar.

A comparison of the volcanic forcings for the ZC model (bottom, green) and for the GISS-E2-R GCM for two different volcanic reconstructions (Gao et al., 2008; Crowley, 2000) (top green curves, reproduced from Lovejoy et al., 2013). Also shown is the combined response (ZC, brown) and the pre-industrial multiproxies (dashed blue).

In the previous sections we considered the implications of linearity when climate models were forced separately with two different forcings compared with the response to the combined forcing; we showed that the ZC model was subadditive. However, linearity also constrains the relation between the fluctuations in the forcings and the responses. For example, at least since the work of Clement et al. (1996), in the context of volcanic eruptions, it has been recognized that the models are typically sensitive to weak forcing events but insensitive to strong ones (i.e. they are nonlinear), and Mann et al. (2005) noticed this in their ZC simulations.

In a scaling regime, both forcings and responses will be characterized by a
hierarchy of exponents (i.e. the function

In order to quantify this, recall that if the system is linear, the response
is a convolution of the system Green's function with the forcing; in
spectral terms it acts as a filter. If it is also scaling, then the filter
is a power law:

Let us investigate the nonlinearity of the exponents by returning to
Eqs. (1)–(3) in more detail. Up until now we have studied the
statistical properties of the forcings and responses using the RMS
fluctuations; for example, we have used the following equation but only for the value

If the driving flux

A drawback of the above fluctuation method for using

We now test Eq. (7); for convenience, we use the symbol

The scaling exponent estimates for the forcings and ZC model responses.

Table 1 shows the scaling exponent estimates for the forcings and ZC model
responses. For solar (forcing and response), only the recent 400-year
(sunspot-based) series were used; for the others, the entire 1000-year range
was used (see Fig. 6a). The RMS exponent was estimated from Eqs. (6) and (9):

Starting with Eq. (7), the basic prediction of multiplicative cascades is that
the normalized moments

It is interesting at this stage to compare the intermittency of the ZC
outputs with those of the GISS-E2-R GCM (Fig. 6b) and with multiproxy
temperature reconstructions (Fig. 6c). In Fig. 6b, we see that the GISS-E2-R
trace moments rapidly die off at large scales (small

This difference in the model responses to the forcing intermittency is
already interesting, but it does not settle the question as to which model
is more realistic. To attempt to answer this question, we turn to Fig. 6c,
which shows the trace moment analysis for six multiproxy temperature
reconstructions over the same (pre-industrial) period as the GISS-E2-R model
(1500–1900; unlike the ZC model, the GISS-E2-R included anthropogenic
forcings, so that the period since 1900 was not used in the GISS-E2-R
analysis). Statistical comparisons of nine multiproxies were made in Ch. 11
of Lovejoy and Schertzer (2013) (for reasons of space, only six of
these are shown in Fig. 6c), where it was found that the pre-2003
multiproxies had significantly smaller multicentennial and lower-frequency
variability than the more recent multiproxies used as reference in Figs. 4
and 5. However, Fig. 6c shows that the intermittencies are all quite low
(with the partial exception of the Mann series; see the upper right plot).
This conclusion is supported by the comparison with the red curves. These
curves indicate the generic envelope of trace moments of quasi-Gaussian processes;
for

The comparison of the GISS-E2-R outputs (Fig. 6b) with the multiproxies
(Fig. 6c) indicates that they are both of low intermittency and are more
similar to each other than to the ZC multiproxy statistics. One is therefore
tempted to conclude that the GISS-E2-R model is more realistic than the ZC
model with its much stronger intermittency. However, this conclusion may be
premature since the low multiproxy and GISS intermittencies may be due to
limitations of both the multiproxies and the GISS-E2-R model.
Multicentennial- and multimillennial-scale ice core analyses display
significant palaeotemperature intermittency (

From the point of view of GCMs, climate change is a consequence of changing boundary conditions (including composition); the latter are the climate forcings. Since forcings of interest (such as anthropogenic forcings) are typically of the order of 1 % of the mean solar input, the responses are plausibly linear. This justifies the reduction of the forcings to a convenient common denominator: the “equivalent radiative forcing” – a concept which is useful only if different forcings add linearly, if they are “additive”. An additional consequence of linearity is that the climate sensitivities are independent of whether the fluctuations in the forcings are weak or strong. Both consequences of linearity clearly have their limits. For example, at millennial and longer scales, energy balance models commonly discard linearity altogether and assume that nonlinear albedo responses to orbital changes are dominant. Similarly, at monthly and annual scales, the linearity of the climate sensitivity has been questioned in the context of sharp, strong volcanic forcings.

In view of the widespread use of the linearity assumption, it is important to quantitatively establish its limits, and this can best be done using numerical climate models. A particularly convenient context is provided by the last millennium simulations, which (in the pre-industrial epoch) are primarily driven by the physically distinct solar and volcanic forcings (forcings due to land use changes are very weak). The ideal case would be to have a suite of the responses of fully coupled GCMs which include solar-only, volcanic-only and combined solar and volcanic forcings and control runs (for the internal variability) so that the responses could be evaluated both individually and when combined. Unfortunately, the optimal set of GCM products consists of the GISS E2-R millennium simulations with solar-only and solar plus volcanic forcing and a control run (this suite is missing the volcanic-only responses). We therefore also considered the outputs of a simplified climate model, the ZC model (Mann et al., 2005), for which the full suite of external forcing response was available.

Following a previous study, we first quantified the variability in the
forcings as a function of timescale by considering fluctuations. These were
estimated by using the difference between the averages of the first and
second halves of intervals

In order to investigate possible nonlinear responses to sharp, strong events
(such as volcanic eruptions), we used the fact that if the system is linear
and scaling, then the difference between the structure function exponents
(

By examining model outputs, we have found evidence that the response of the
climate system is reasonably linear with respect to the forcing up to timescales of 50 years at least for weak (i.e. not sharp, intermittent) events.
But the sharp, intermittent events such as volcanic eruptions that
occasionally disrupt the linearity at shorter timescales become rapidly
weaker at longer and longer timescales (with scaling exponent

The ZC simulation outputs and corresponding solar and volcanic forcings were
taken from