Uncertainty in the simulation of the carbon cycle contributes significantly to uncertainty in the projections of future climate change. We use observations of forest fraction to constrain carbon cycle and land surface input parameters of the global climate model FAMOUS, in the presence of an uncertain structural error.

Using an ensemble of climate model runs to build a computationally cheap statistical proxy (emulator) of the climate model, we use history matching to rule out input parameter settings where the corresponding climate model output is judged sufficiently different from observations, even allowing for uncertainty.

Regions of parameter space where FAMOUS best simulates the Amazon forest fraction are incompatible with the regions where FAMOUS best simulates other forests, indicating a structural error in the model. We use the emulator to simulate the forest fraction at the best set of parameters implied by matching the model to the Amazon, Central African, South East Asian, and North American forests in turn. We can find parameters that lead to a realistic forest fraction in the Amazon, but that using the Amazon alone to tune the simulator would result in a significant overestimate of forest fraction in the other forests. Conversely, using the other forests to tune the simulator leads to a larger underestimate of the Amazon forest fraction.

We use sensitivity analysis to find the parameters which have the most impact on simulator output and perform a history-matching exercise using credible estimates for simulator discrepancy and observational uncertainty terms. We are unable to constrain the parameters individually, but we rule out just under half of joint parameter space as being incompatible with forest observations. We discuss the possible sources of the discrepancy in the simulated Amazon, including missing processes in the land surface component and a bias in the climatology of the Amazon.

The works published in this journal are distributed under the Creative
Commons Attribution 3.0 License. This license does not affect the Crown
copyright work, which is re-usable under the Open Government Licence (OGL).
The Creative Commons Attribution 3.0 License and the OGL are interoperable
and do not conflict with, reduce or limit each other. ^{©}Crown copyright 2016

Earth system processes that are too high resolution or complex to model
explicitly are often simplified or

Throughout the paper we
often use

Choosing parameterisation coefficients is a major research effort
encompassing domain specific, statistical and computational literature.
Coefficients are tuneable by comparing the simulator with observations of the
system, by direct measurement or from information from theory. There is a
long history of using observations to constrain parameterisation coefficients
within general circulation models (GCMs), particularly within atmospheric
components. Where this is done in a formal probabilistic setting it can
provide probability distributions for the parameters of the simulator; this is
known as

Simulator discrepancy is the systematic difference between a climate model,
or simulator, and the system that is represented by that model. It is also
known as model (or simulator) bias, model error, or structural error. A “best
input” approach typically defines discrepancy as the difference between the
modelled system and the simulator when run at an input where output from the
simulator conveys all it can about the system (see, e.g.,

Simulator discrepancy might be known ahead of time: perhaps a parameterisation of a process occurring at too high a resolution to simulate has a predictable effect on simulator behaviour. Alternatively, the discrepancy might be due to some missing and unknown process in the simulator, or to unknown parameterisation values. This might appear as a bias, only becoming apparent when output from the simulator is compared with observations of the real system. In both cases, the modeller must have a strategy for dealing with the discrepancy when using the simulator to make judgements about the system.

Simulator discrepancy is a major challenge during calibration.

Parametric uncertainty in the land surface and carbon cycle component of
models is expected to represent a large fraction of current uncertainty in
future climate projections

Using statistical and data assimilation approaches to constrain land surface
simulator process parameters extends back at least to

We aim to identify parameter sets of the land surface module of the climate simulator FAMOUS where simulator output and observations of forest fraction are consistent to an acceptable degree. An initial attempt using history matching suggests that FAMOUS is unable to simulate the Amazon forest and other forests simultaneously at any set of parameters within the experiment design. We argue that this is due to a fundamental simulator discrepancy, which has implications for constraining the input parameters of FAMOUS. We use a number of techniques to characterise and find the drivers of this structural error, before performing a second history match with an appropriate discrepancy function.

In Sect.

We use a pre-existing ensemble of the climate simulator FAMOUS throughout
this study. The Fast Met Office UK Universities Simulator, FAMOUS

The inclusion of vegetation in FAMOUS is documented in

FAMOUS shows a Northern Hemisphere-winter surface air temperature cold bias
with respect to HadCM3 and also the overestimation of the fractions of
needleleaf trees in North America and C

Land surface input parameters for FAMOUS. PFT: plant functional type; LAI: leaf area index.

We use an ensemble of 100 simulations of FAMOUS detailed in

This design builds upon a previous ensemble run by

Ranges for the land surface parameters follow those used in the study by

The ensemble simulates the pre-industrial climate, with ensemble members spun
up over a 200-year period to ensure that the vegetation is in equilibrium
with the climate at 290 ppm of CO

Observations of broadleaf forest fraction (top left panel). Mean (top right panel) and standard deviation (bottom left panel) of broadleaf forest fraction across the 100-member ensemble of FAMOUS.

FAMOUS input parameters and forest fraction parameters, plotted against each other. Default inputs (not run) are marked in red.

We compare simulated forest fraction against observations adapted from

South East Asian and Central African forests vary together very strongly across the ensemble, whereas the Central African and North American forests show a weaker relationship. The latter might be expected, given the different structure of the North American forests, compared with the tropical. The scatter plot also identifies NL0 (leaf nitrogen) and V_CRIT_ALPHA (soil moisture control on photosynthesis) as being important controls on forest fraction, as the output seems to vary most with these parameters.

FAMOUS is not fast enough to run at every point within input space required
for our analyses. We therefore use a computationally cheap statistical proxy
to the simulator, called an emulator. The emulator is a non-parametric
regression model conditioned on the ensemble, providing a prediction of
simulator output and corresponding uncertainty orders of magnitude faster
than the original simulator. Once trained, any analysis that might have been
done with the simulator can be done with the emulator, provided we include
the extra uncertainty term to account for the fact that the emulator is not a
perfect prediction of the simulator output. A useful introduction to
emulators and their uses can be found in

We use a Gaussian process emulator that assumes zero uncertainty at points
where the simulator has already been run, growing larger away from those
points. We treat the output

After

What distinguishes history matching from simulator calibration, where a probability
distribution over the parameters is described, is that it rejects inputs inconsistent with
observations, or otherwise classifies them as “not ruled out yet” (NROY). We regard NROY inputs as conditionally accepted, contingent on
new observations or information. History matching was developed by

Observations of the system are denoted

Each candidate point is assigned an implausibility,

In this section we find regions of land surface parameter space in FAMOUS
that remain NROY given some defensible assumptions about observational
uncertainty. Figure

Histograms representing the number of ensemble members of a
particular forest fraction in each region, as well as globally. Points plotted below
the histograms represent the observed forest fraction (colours) and the
forest fraction simulated at the “standard” parameters

The simulator run at the standard inputs significantly underestimates the
forest fraction in the Amazon region, with a best estimate of

We aim to find regions of parameter space where simulator error is removed, or minimised to a level consistent with observational uncertainty. In practice, this requires finding a region where the large negative bias in Amazon forest fraction is minimised while keeping the other forests well represented.

On the advice of domain experts, we assume observational uncertainty of 0.05
(1 SD) in the Amazon, Central African, South East Asian, and
North American forests as broadly representative, or at least usefully
illustrative. This corresponds to an expectation that the true 95 %
confidence interval is contained within the interval of

We sample uniformly across input parameter space and run the emulator at
these locations. We history-match the samples using all four individual
forest observations and visualise the space where max[

Does this region represent a viable set of inputs, perhaps to replace the
default set of parameters, or should we include a non-zero discrepancy term
(

Implausibility

In the remainder of this section, we use a number of analysis techniques to investigate why a region on the edge of parameter space was initially considered plausible, and which does not contain the default parameter settings, is identified as NROY.

We perform a sensitivity analysis to identify the active subspace of
simulator inputs and quantify relationships between inputs and outputs. In a
descriptive sensitivity analysis, we show emulated mean regional and global
forest fraction with inputs sampled from across input parameter space in a
one-factor-at-a-time fashion, holding all but one parameter at their standard
values while varying the remaining parameter (Fig.

A density pairs plot of two-dimensional projections of parameter space. The blue areas represent the density of NROY points, using all of the data, with an assumed observational uncertainty of 0.05 (1 SD).

Best-estimate draws of forest fraction output from the emulator, at
the set of points not ruled out yet when assuming a credible observational
uncertainty. The value of the observed forest fractions is plotted as a
single point on the corresponding

Marginal sensitivity of mean forest fraction to each input parameter
in turn, with all other parameters held at standard values. Central lines
represent the emulator mean, and shaded areas represent the estimate of
emulator uncertainty, at the

Sensitivity analysis of forest fraction via the FAST algorithm of

V_CRIT_ALPHA, and NL0 are the most influential individual parameters and counter each other when both increased. The Q10 parameter has little or no influence on forest fraction. The TUPP parameter is important only to the Central African (termed “Congo” here, for brevity) and South East Asian forest fraction, much less important to the Amazon, and not important at all to the North American forests.

The relationships change across parameter space and are therefore dependent on the somewhat arbitrary range of the initial input parameters of the ensemble design. Sensitivity can change in importance as parts of input space are ruled out. For example, the forests are most sensitive to NL0 in the lower part of the ensemble range, and most sensitive to V_CRIT_ALPHA in the upper part of the ensemble range.

Following

Parameter Q10 has almost no influence on forest fraction, in line with the expectations of land surface modellers. This non-zero estimate of sensitivity is likely due to the fact that the emulator is not a perfect representation of the simulator, and a zero sensitivity is well within the uncertainty bounds of the sensitivity analysis. Parameters TUPP and R_GROW have very little impact on forest fraction. Parameter F0 has virtually no influence away from the tropics; conversely, LAI_MIN is only important in the North American forest.

In this section, we examine the ability of the simulator to reproduce the observed forest fraction, as well as how that ability varies across input parameter space, and assess the region of parameter space which is consistent with each of the forest fraction observations.

We show a map of simulator error in the two-dimensional space of the most
important parameters identified in Sect.

We find the potential of the history-matching technique to rule out parameter
space under a number of scenarios of tolerance to observational and simulator
structural error. The denominator of Eq. (

Different observations rule out different parts of parameter space, while
combining observations can be a powerful method of ruling out large parts of
parameter space. A number of approaches to combining data in history matching
are discussed in

A conservative approach is to reject a candidate point only if it is judged
implausible using a number of measures. This will be more robust to a poorly
specified simulator discrepancy term.

Maps of simulator error, in units of forest fraction, when projected into the two-dimensional space of the most active parameters, NL0 and V_CRIT_ALPHA.

Proportion of NROY (not ruled out yet) input space plotted against “tolerance to error” – the total error budget including emulator, observational, and simulator discrepancy uncertainty.

To understand the value of individual observations, we ask the following questions: what is our tolerance to
error? And what level of uncertainty in observations or simulator
discrepancy can we tolerate before our observations become ineffective for
history matching? Figure

North American, South East Asian, and Central African forest observations constrain parameter space to between 40 and 50 % of parameter space, even when our tolerance to error is very low. The proportion of NROY space increases quickly, particularly using North American forest fraction, which becomes no constraint at all when our error tolerance is above 0.07 (1 SD). The other forests offer some constraint up to about 0.1 (1 SD), and the Amazon is more of a constraint, only losing power as a constraint when the standard deviation of our tolerance to error is above 0.15 (1 SD).

Combining data and using the maximum implausibility of any dataset improves the constraint, particularly when the tolerance to error is low. However, we urge caution. The fact that (a) the performance of the Amazon dataset appears different from the other observations and (b) that all parameter space is ruled out at lower values, even though there is emulator uncertainty, again raises concerns of a poorly specified Amazon simulator discrepancy.

A more robust calculation of tolerance to error can be found by excluding the Amazon observations and using the maximum implausibility from the other observations. This excludes more input parameter space than any single observation on its own, up to a tolerance to error of around 0.85 (1 SD), where it performs in a similar manner to using South East Asian forest fraction.

To what extent do the input spaces that are NROY when history matching with
two forests overlap? We suppose that data that suggest highly overlapping
input spaces give us confidence that those input spaces are valid. Another
perspective is that overlapping input spaces give us little extra
information, and we should seek out those data that minimise overlap. We sample
uniformly from the input space and test each point using a comparison with
each forest observation to see if it is ruled out. If a point has the same
status using both forests in the history match, we class that as an
overlapping point. Table

The most similar input space is found if we use the South East Asian and Central African rainforests. Comparing these forests with the North American forests gives a fairly high overlap – 61 and 66 % for South East Asia and Central Africa respectively. The Amazon has markedly lower overlap with the other forests: 40 % at the most with North America and only 26 % with South East Asia.

Amount of overlap in NROY input space for forest combinations.

To more fully explore the causes of simulator discrepancy and its consequences, we make the illustrative assumption that simulator discrepancy uncertainty is zero, and that observational uncertainty is very low. We sample a large number of points uniformly across input space and assume simulator discrepancy uncertainty of zero and an observational uncertainty of 0.01.

We classify as NROY only those emulated samples where the implausibility (or maximum implausibility in the case of combined data) is below 3. Setting such a demanding threshold allows us to find and describe the relatively small regions in input space where the simulator performs best, in two cases: first, using the South East Asian, Central Africa, and North American forest fraction in the history-matching exercise, and second using the Amazon forest fraction.

When plotted in two-dimensional projections (Fig.

Marginal density of input parameter sets consistent with a very low “tolerance to error”, as well as perfect observations, for the North American, South East Asian, and Central African forests combined (top panel) and the Amazon (bottom panel). Dark blue indicates those regions which have the highest concentration of NROY candidates and which are therefore most compatible with the observations.

Top panel: forest fraction in the FAMOUS Amazon at the set of parameters where FAMOUS best matches each of the other forest observations. Bottom panel: other forests in FAMOUS at the set where the FAMOUS Amazon best matches observations. Observed forest fractions are shown as marks underneath the histograms.

Maps of mean broadleaf forest fraction, over the “best” set of parameters found for the Amazon (top panel) and the Central African forest (centre panel). The difference between the two is mapped at the bottom panel.

FAMOUS struggles to simulate both the Amazon and the other forests simultaneously, at any parameter combination when using a low threshold of implausibility. It is very difficult to reconcile the simulation of the Amazon simultaneously with the other forests if there is little uncertainty about the observations. A simulator discrepancy term and corresponding uncertainty is therefore necessary to attain an adequately performing simulator.

To examine the implications of using each observation separately to tune the
simulator, we use the emulator to project each forest at the set of
“best” inputs: those where the simulator reproduces each forest with a very
small tolerance of error. We then use the emulator to project the Amazon
forest fraction using the “best” parameters for each forest, as well as the forest
fraction for each of those forests using the “best” parameters for the
Amazon in Fig.

We find that the using the best set of parameters as defined for each non-Amazon forest would likely lead to an underestimate of the Amazon forest fraction by around 50 %, compared to the observed fraction (around 0.3, compared to an observation of around 0.6). Conversely, using the best parameters as defined for the Amazon leads to an overestimate of the other forests – around 0.3 for the tropical forests and 0.15 for the North American forest – even though the observed aggregate forest fraction is very similar for the tropical forests.

To further explore this difference, we project the “best” set of input
parameters, found using the Amazon and African forest to match the simulator
against, over a map of the entire FAMOUS land surface. In each case, an
independent emulator is trained on the ensemble for each grid box. The maps
of the mean forest fraction for each parameter set, and the difference
between them, are shown in Fig.

Even using the “best” Amazon parameters, the simulator underestimates the Amazon coverage in the north-east of South America. This makes it very difficult to simulate a sensible forest fraction, even when overestimating the forest fraction in places where the simulator does have forest cover.

The previous sections show that the inputs where FAMOUS best simulates Central African, South East Asian, and North American forests cover a similar input space, whereas the best inputs for the Amazon are in a different region. A parsimonious approach would be to use a non-zero-mean discrepancy for the Amazon: allowing the Amazon to be less vigorous in our simulations, while maintaining that the simulator output should broadly match the other forests.

We perform a history match using all of the forest observations, along with a
simulator discrepancy term for the Amazon forest. We use the best estimate of
the difference between Amazon observations and that simulated by FAMOUS at
the default set of parameters as the best estimate of the discrepancy mean.
The difference in forest fraction at the default parameters is approximately 0.3.
Figure

Histograms of emulated simulator output using credible estimates for observational uncertainty, a simulator discrepancy term for the Amazon, and credible discrepancy uncertainty.

A density plot of the two-dimensional projections of NROY samples from the design input space, using all forest observations and a discrepancy function for the Amazon.

Our analysis illustrates the challenges in distinguishing between simulator
discrepancy, parameter uncertainty, and observational uncertainty during
simulator development. For example, forest fraction in the simulator can be
tuned largely by using the two most active parameters: V_CRIT_ALPHA and
NL0. As these parameters alter forest fraction in counteracting directions, a
number of solutions can be found that give plausible forest fractions.
Information from outside sources about the “true” values of one these
parameters might therefore offer a strong constraint on the value of the
other. NL0 is the leaf nitrogen parameter – the ratio of nitrogen to carbon
found in leaves. In theory, this is something that is well observed and
recorded, but it is uncertain what value should be to reflect the
observational range across the spatial scale of FAMOUS. Nitrogen content
determines the maximum photosynthesis, and therefore how much CO

Using observations of the Amazon rainforest along with the other forests
major forests in the history-matching exercise results in ruling out a large
swathe of parameter space, including the default set of parameters, and
leaving a corner of parameter space not ruled out yet. While it appears that
here simulator output is tolerably close to the observations given a
zero-mean discrepancy, there are good reasons to be suspicious of this
region. For illustration, we imagine a situation where we are forced to
choose between keeping the default parameters and including a simulator
discrepancy function, or rejecting them and accepting a candidate from the
new NROY region. Our choices will be dictated by the objective of our
analysis: do we wish to provide only the best possible prediction, or do we
wish to find parameter values which are, to some extent, “true”? For a
simple prediction problem, we will be less concerned that the parameters more
accurately reflect something we might measure in the real system, and might
be less inclined to include a discrepancy term. However, sustainable
development of the simulator requires that we get things right

First, the NROY region excludes the default set of parameters, chosen as the result of multiple lines of evidence, scientific judgement, and experience using this and other simulators. Second, the NROY region is close to the edge of the ensemble in the active parameter subspace, so that emulator uncertainty, combined with the generous observational and discrepancy uncertainty, may dominate the implausibility calculation. Emulators tend to increase in uncertainty near the edge of an ensemble, as they are forced to extrapolate more than at the centre of the ensemble. Third, the information obtained from using each of the four forests shows that the Central African, South East Asian, and North American forests all indicate very similar, highly overlapping NROY regions. In contrast, the NROY region suggested by comparing FAMOUS to observations from the Amazon is very different. Finally, tuning to each of the “best” parameters for each of the forests suggests that the NROY region produces an inevitable compromise: the Amazon will be very likely be underestimated, and the other forests overestimated, if observational uncertainty is reduced. It is possible that there are correlated errors in the other forests, rather than in the Amazon. However, we argue that this is less likely, given that the other forests include tropical (like the Amazon) and the boreal forest of North America.

We therefore urge caution with a naive or automatic application of history-matching conclusions, particularly when using multiple observations for
comparison with the simulator. Even in our relatively simple history-matching exercise, there is a clear need to include simulator
discrepancy, to increase simulator discrepancy uncertainty, or to apply a conservative version of
the measure of implausibility. One strategy, adopted, for example, by

We are able to offer a counter-example to the hypothesis of

We find that forest fraction does not offer a marginal constraint on the parameters: that is, there is little or no constraint on each parameter individually, but there is a significant constraint on the joint input space of the parameters. Approximately 43 % of a priori parameter space is ruled out, which is relatively little compared to other studies. This is explained by several factors: (1) the ensemble covers a relatively small input space, compared to other studies, due to the fact that the simulator is based on a well-studied climate model, HadCM3; (2) our observational uncertainty is assumed conservatively large; and (3) we have only a single wave of history matching. A further experiment could run the climate simulator within the NROY space in order to reduce emulator uncertainty and provide a basis to further rule out input space. The value of further waves of history matching might be diminished by the fact that the simulator likely has a large discrepancy in the Amazon, and the simulator discrepancy uncertainty is likely a large component of the overall uncertainty budget.

We suggest three possible causes of fundamental structural error which are

Second, is there a missing processes in the vegetation model that impacts
the Amazon or other forests in FAMOUS, or has the Amazon perhaps
developed in other ways not seen in the other forests? For example, it is
possible that the real Amazon can access water to a deeper level than other
forests, through deep rooting. This would cause a

Finally, does the simulator simulate the climatic boundary conditions of the
forest well enough?

We analyse an ensemble of the fast climate simulator FAMOUS with the aim of constraining carbon cycle parameters through a comparison of simulator output with forest observations. We find that we are unable to constrain the parameters individually, but that areas of joint parameter space are effectively ruled out. With a defensible simulator discrepancy term for the Amazon and assumed observational uncertainty we are able to rule out 43 % of the input parameter space defined by the ensemble design.

We identify moisture control on photosynthesis (V_CRIT_ALPHA) as the most important parameter control on forest fraction, with the next most important parameter, leaf nitrogen (NL0), being approximately half as important, but still twice as important as any other parameter. These parameters have counteracting effects on the forest fraction, so we are unable to rule out a broad swathe of the joint space of these two parameters.

We suggest that we should exercise care if using observations of the Amazon
rainforest to constrain the input parameters of FAMOUS, as an apparent
structural bias in the climate simulator could lead to misleading results.
Using the Amazon forest as an observational constraint suggests very
different parts of input parameter space as

Using a history-matching technique, we investigate the limits of observational and simulator discrepancy uncertainty, beyond which observations no longer offer a constraint on input parameter space. We find that if this total error budget is larger than approximately 0.1 (1 SD of forest fraction), and excluding the Amazon rainforest as a comparison, the observations will not offer any form of constraint on the current ensemble, even in joint parameter space.

Underlying data are available as an R data file:

Doug McNeall and all authors designed the analysis. Doug McNeall conducted the analysis and wrote the paper. Jonny Williams provided the FAMOUS ensemble and Ben Booth provided the observed forest fraction data.

Richard Betts is a member of the editorial board of

This work was supported by the Joint UK BEIS/Defra Met Office Hadley Centre Climate Programme (GA01101). Doug McNeall was supported on secondment to Exeter University by the Met Office Academic Partnership (MOAP) for part of the work. Jonny Williams was supported by funding from Statoil ASA, Norway. Edited by: J. Annan Reviewed by: R. D. Wilkinson and one anonymous referee