Climate model emulation in an integrated assessment framework: a case study for mitigation policies in the electricity sector

Abstract. We present a carbon-cycle–climate modelling framework using model emulation, designed for integrated assessment modelling, which introduces a new emulator of the carbon cycle (GENIEem). We demonstrate that GENIEem successfully reproduces the CO2 concentrations of the Representative Concentration Pathways when forced with the corresponding CO2 emissions and non-CO2 forcing. To demonstrate its application as part of the integrated assessment framework, we use GENIEem along with an emulator of the climate (PLASIM-ENTSem) to evaluate global CO2 concentration levels and spatial temperature and precipitation response patterns resulting from CO2 emission scenarios. These scenarios are modelled using a macroeconometric model (E3MG) coupled to a model of technology substitution dynamics (FTT), and represent different emissions reduction policies applied solely in the electricity sector, without mitigation in the rest of the economy. The effect of cascading uncertainty is apparent, but despite uncertainties, it is clear that in all scenarios, global mean temperatures in excess of 2 °C above pre-industrial levels are projected by the end of the century. Our approach also highlights the regional temperature and precipitation patterns associated with the global mean temperature change occurring in these scenarios, enabling more robust impacts modelling and emphasizing the necessity of focusing on spatial patterns in addition to global mean temperature change.


Introduction
Integrated assessment modelling can be used to explore the climatic consequences of particular climate mitigation policy scenarios. However, most integrated assessment models (IAMs) do not directly utilize sophisticated coupled Atmosphere Ocean General Circulation Models, such as those employed in the Coupled Model Intercomparison Project Phase 5 (CMIP5: Friedlingstein et al., 2014), to represent the climate and carbon cycle. Due to the large computational re-sources they require, the direct use of such models within IAMs is not feasible.
Instead, many IAMs have used simple mechanistic models to represent the carbon cycle. One such simplified carboncycle/climate model is MAGICC6 (Meinshausen et al., 2011a), which is calibrated against higher-complexity models from the Coupled Carbon Cycle Climate Model Intercomparison Project (C4MIP), to emulate the atmospheric CO 2 concentrations of those models. Schaeffer et al. (2015) used MAGICC6 to derive probability distributions for radiative Published by Copernicus Publications on behalf of the European Geosciences Union.
forcing, which drive a simple climate model that projects global mean temperature response by linearly scaling the CO 2 step experiment response of 17 CMIP5 General Circulation Model (GCM) 4 × CO 2 simulations. Such approaches can be used to generate large ensembles quite quickly; for instance, MAGICC6 has been used to generate a 600-member perturbed parameter ensemble (Schaeffer et al., 2015) of CO 2 -equivalent concentration and global mean surface air temperature change projections.
It has been suggested that a conceptual advantage of this approach is that the mechanistic model fit adds some confidence when extrapolating beyond the training data (Meinshausen et al., 2011a). A limitation of simplified mechanistic models is that they may contain a high level of parametrization. For example, the Meinshausen et al. (2011a) carbon cycle calibration procedure uses global mean temperature as a proxy for changes in patterns of temperature and precipitation. These drivers of change in the carbon cycle would be explicitly represented in a more sophisticated model.
To represent regionally varying patterns of climatic change, as opposed to global mean temperature change, many IAM studies have used pattern scaling (e.g. IMAGE: Bouwman et al., 2006). This computationally inexpensive technique linearly relates regional climatic change, derived from stored GCM ensembles such as those generated in CMIP5, to global mean temperature change, simulated using a simplified model, so that the regional response to many emissions scenarios can be computed quickly (e.g. Cabré et al., 2010). Simple pattern scaling assumes that the climate response is spatially invariant (with respect to time and forcing), and therefore cannot capture aspects which may be sensitive to the greenhouse gas (GHG) concentration pathway (O'Neill and Oppenheimer, 2004;Tebaldi and Arblaster, 2014). Tebaldi and Arblaster (2014) cite a number of instances where it is liable to break down, in particular for scenarios with strong mitigation or less mean temperature change. Recent advances in pattern scaling have considered the effects of different forcing components; for example, with the most recent iteration of MAGICC-SCENGEN, the effects of aerosols can be estimated for some climate parameters by generating patterns specific to these emissions 1 .
The Atmosphere-Ocean General Circulation Model (AOGCM) ensembles used in pattern scaling are usually multi-model ensembles (MMEs). Such ensembles consist of simulations from different models, and are neither a systematic nor random sampling of potential future climates (Tebaldi and Knutti, 2007). Similarities between models may lead to a lack of independence amongst ensemble members (Foley et al., 2013), complicating the interpretation of the ensemble as a whole (Knutti et al., 2013).
Perturbed physics ensembles (PPEs) offer a more systematic sampling of potential future climates, but embedding a PPE approach into an IAM framework requires a computationally fast climate model. In this context, statistical emulation of complex models is a useful alternative. For example, Castruccio et al. (2014) constructed a statistical climate model emulator using simulations performed with the Community Climate System Model, version 3 (CCSM3), in which statistical models are fitted to temperature and precipitation for 47 subcontinental-scale regions. Such an approach is suitable for applications requiring annual temperatures of specific regions, but is less appropriate when climate impacts within regions are to be considered. Carslaw et al. (2013) apply a similar approach to the grid cell level. However, such an approach requires many emulators, and correspondingly, computational resources. Furthermore, the global emulation may not be self-consistent, as the individual emulators do not utilize the correlations between grid cells.
In this paper, we demonstrate how model emulation using singular vector decomposition (SVD) can be used within an IAM framework to generate PPEs, systematically capturing uncertainty in the future climate state while also providing insight into regional climate change. We introduce the GENIEem-PLASIM-ENTSem (GPem) climate-carboncycle emulator, which consists of a statistical climate model emulator, PLASIM-ENTSem, to represent climate dynamics , and a new carbon cycle emulator GE-NIEem. Compared to a simple mechanistic model, the purely statistical GENIEem does not impose a predefined functional structure, allowing the emulator to capture more of the behaviour of the underlying simulator, and notably providing a representation of the parametric uncertainty of the simulator. Although parametric uncertainty of MAGICC itself can be investigated (Meinshausen et al., 2009), this is distinct from representing the parametric uncertainties and associated non-linear feedbacks in the underlying simulator. Similarly, compared to pattern scaling, the more complex statistical approach used in PLASIM-ENTSem enables a representation of spatial uncertainties due to parametric uncertainties in the underlying model. The use of SVD to decompose spatial patterns of climate parameters makes PLASIM-ENTSem computationally efficient, compared to techniques in which statistical relationships are developed for each grid cell.
We demonstrate how these emulators can be applied in an IAM framework to resolve the regional environmental impacts associated with policy scenarios by coupling GPem to FTT:Power-E3MG, a non-equilibrium economic model with a technology diffusion component. Our work builds on that of Labriet et al. (2015) and Joshi et al. (2015) who also derived IAMs from economic and energy technology system models coupled to PLASIM-ENTSem. The carbon cycle model emulator GENIEem is an emulator of the GENIE-1 Earth System Model (ESM) (Holden et al., 2013a) (i.e. a statistical model that approximately reproduces selected outputs from the full GENIE-1 ESM). The emulator takes a time series of anthropogenic carbon emissions and non-CO 2 radiative forcing (stemming from CH 4 , N 2 O, halocarbons, and other forcing agents including O 3 and aerosols) as inputs and provides a time series of atmospheric CO 2 concentration as output.
In the integrated assessment framework developed here, the time series of anthropogenic carbon emissions is provided by E3MG-FTT, while non-CO 2 forcing data are derived from global time series of forcing data obtained through the RCP Database 2 . As such, GPem emulates highdimensional climate outputs as a function of scalar model inputs . We note that certain forcings, such as aerosol forcing, are characterized by complex spatial patterns and so would benefit from an approach in which the inputs are also high dimensional. However, incorporating such forcing into the emulator framework would involve coupling an aerosol model to PLASIM-ENTS in order to build an ensemble of simulations and a subsequent emulator, which is beyond the current scope of this work.

GENIE-1 description
The full GENIE-1 ESM comprises the 3-D frictional geostrophic ocean model GOLDSTEIN (Edwards and Marsh, 2005) coupled to a 2-D Energy Moisture Balance Atmosphere based on that of Fanning and Weaver (1996) and Weaver et al. (2001), and a thermodynamic-dynamic sea-ice model based on Semtner (1976) and Hibler (1979). Ocean biogeochemistry is modelled with BIOGEM , coupled to the sediment model SEDGEM . GENIE-1 is run at 36×36 spatial resolution (≈ 10 • × 5 • on average) with an ≈1-day atmospheric time step, and 16 depth levels in the ocean. Vegetation is simulated with ENTSML (Holden et al., 2013a), a dynamic model of terrestrial carbon and land use change (LUC) based on the single plant functional type model ENTS (Williamson et al., 2006). ENTSML takes time-varying fields of LUC as inputs. Each simulation used to build the emulator is a transient simulation from AD 850 through to 2105. Historical forcing (AD 850 to 2005), including changing land use, is prescribed as described in Eby et al. (2013). Future forcing (2005 to 2105) is defined by a CO 2 concentration time series and a non-CO 2 radiative forcing time series, both represented by polynomials (see Sect. 2.1.2). The LUC mask is held fixed from 2005, as capturing LUC-climate-carbon feedbacks in the emulator would require high-dimensional inputs, a signif-icantly more complex ensemble design and emulation challenge. The future forcing due to LUC is instead subsumed into the CO 2 concentration (LUC emissions) and non-CO 2 radiative forcing (LUC albedo).
The configuration is the same as that applied in the Earth system model of intermediate complexity (EMIC) intercomparison project . Due to its reduced complexity, GENIE-1 is a good choice for performing the many simulations required to build an emulator.

GENIE-1 parameter set selection
Construction of GENIEem is summarized in (Fig. 1). To build the carbon cycle emulator, a subset of the 471-member emulator filtered plausibility-constrained parameter sets described in Holden et al. (2013b) is used. Each of these 471 parameter sets was previously applied to a CO 2 emissionsforced transient historical simulation (AD 850 to 2005). They comprise experiments 1 and 2 of Holden et al. (2013a). In addition to emissions forcing, these simulations were forced by non-CO 2 trace gases, LUC, anthropogenic aerosols, volcanic aerosols, orbital change and solar variability, as described in Eby et al. (2013).
The 471 parameter sets are constrained to be plausible in the preindustrial state by design (Holden et al., 2013b). However, they are not constrained to be plausible in the present day as neither the anthropogenic carbon sinks nor the LUC emissions are calibrated. Additionally, these 471 parameter sets are known to contain members that display numerical instabilities (Holden et al., 2013a).
In order to identify useful parameter sets, we apply a filter to this transient historical ensemble. A parameter set is accepted as plausible if the difference between simulated and observed atmospheric CO 2 concentration lies within an acceptable range at each of five time points, AD 1620AD , 1770AD , 1850AD , 1970AD and 2005: where CO 2 (t) and CO * 2 (t) are simulated and observed atmospheric CO 2 concentration, evaluated at each time slice t, and the acceptable errors 0 and t relate to the preindustrial spinup state and to the transient change. The time points span the preindustrial period and are not associated with volcanic eruptions as these can lead to an unrealistic carbon cycle response in GENIE due to the single-layer soil module (Holden et al., 2013a).
The 0 term dominates the acceptable error during the preindustrial era and is designed to reject any simulations that exhibit numerical instability. It is set equal to 2 SD (standard deviations) (9 ppm) of the 471-member spin-up ensemble. The t term is given by 0.22 × (CO * 2 (t) − 280) ppm. This term dominates the acceptable error in the post-industrial era and is designed to reject simulations that exhibit an unreasonable strength for the CO 2 sink. It approximately limits the Accept parameter set if the difference between simulated and observed atmospheric CO 2 concentration lies within an acceptable range at 1620, 1770, 1850, 1970 and 2005. Generate an ensemble of future simulations , forced with time-varying CO 2 emissions and non-CO 2 radiative forcing. Each parameter set is reproduced three times and combined with different future emissions profiles.
Calibrate CO 2 fertilisation parameter k 14 by randomly sampling its assumed prior distribution and replacing the k 14 values in the 86 GENIE parameter sets.
Apply singular vector decomposition to the 100 x 257 matrix of simulated output and emulate the first four principal components (PCs).

Calibrated GENIEem climate/carbon cycle emulator
(generates an 86-member ensemble using GENIE-1 parameter sets, for six given Chebyshev coefficients) ( range of acceptable uncertainty to the inter-model variance of the multi-model C4MIP ensemble (Friedlingstein et al., 2006), assuming that the range of simulated CO 2 change across the C4MIP ensemble scales linearly with simulated CO 2 change relative to preindustrial (280 ppm). Eighty-six parameter sets satisfied this constraint at all five time points.

GENIE-1 ensemble design
These 86 parameter sets from the full GENIE-1 ESM were used to generate an ensemble of future simulations (2005 to 2105) forced with time-varying CO 2 emissions and non-CO 2 radiative forcing. Each simulation was continued from its respective transient historical simulation. Radiative forcing was applied as a globally uniform additional term in outgoing long-wave radiation to capture the combined effects of non-CO 2 trace gases, aerosols and LUC on global temperature. The LUC mask was fixed at the 2005 distribution, but effects of future land use changes are accounted for, albeit approximately, in the applied radiative forcing and emissions anomalies.
To capture the range of possible future forcing we followed the approach of Holden and Edwards (2010). The CO 2 emissions profile is represented using Chebyshev polynomials M i (i = 0, . . . , 3), arrived at by linear combination of Chebyshev polynomials T i . For example, if the first few Chebyshev polynomials are T 0 (t) = 1, T 1 (t) = t, T 2 (t) = 2t 2 − 1 and T 3 (t) = 4t 3 − 3t, and we have M 3 (t) = 4t 3 − 4t, then this can be expressed as a combina- Following this approach, the CO 2 emissions profile is represented as where t is time, normalized onto the range −1 to 1 (2005 to 2105). The coefficient ranges were chosen to span emissions consistent with the RCP pathways (Moss et al., 2010): E 1 = −30 to 30 GtC yr −1 , E 2 = −15 to 15 GtC yr −1 , E 3 = −15 to 15 GtC yr −1 . The 2005 emissions E 0 = 9.166 GtC yr −1 . Note that Eq. (2) is strictly a linear combination of Chebyshev polynomials such that the first two terms give the linear increase in emissions; we refer to the coefficients henceforth as "Chebyshev coefficients".
The non-CO 2 radiative forcing profile is also represented by a linear combination of Chebyshev polynomials: These Chebyshev coefficients are varied in the ranges The E 1 and R 1 coefficients define the 2100 CO 2 emissions and non-CO 2 radiative forcing respectively. The remaining coefficients determine the curvature of the profile. The ranges for all six coefficients have been chosen to encompass (and exceed) the ranges of 21st century forcing; for emulator training we apply wider ranges than we expect to need in order to ensure the emulator is never used under extrapolation. Selecting a broad training range helps to ensure that the emulator will remain suitable for use in many different applications, and not only within the context of the scenarios studied in this work.
For example, the maximum E 1 = 30 gives 2100 CO 2 emissions of E 0 + E 1 = 39.166 GtC, which compares to RCP8.5 emissions of 28.817 GtC. Maximum radiative forcing of R 0 + R 1 = 10.619 W m −2 was allowed to greatly exceed RCP estimates (maximum 1.796 W m −2 ) in order to allow the potential application of the emulator to extreme non-CO 2 forcing scenarios, for instance to represent non-CO 2 (e.g. methane) runaway feedbacks (Schmidt and Shindell, 2003) or geo-engineering in a high-CO 2 future (Irvine et al., 2009).
The 86 parameter sets were replicated three times, and each of these three 86-parameter sets were combined with different future emissions profiles to produce a 258-member ensemble. To achieve this, the six coefficients were varied over the above ranges to create a 258-member Maximin Latin Hypercube design, using the maximinLHS function of the lhs package in R (R Development Core Team, 2013) -257 simulations completed; in the remaining simulation, input parameters led to an unphysical state and ultimately, numerical instability.

Construction of GENIEem
The emulation approach closely follows the dimension reduction methodology detailed in Holden et al. (2014). We have an ensemble of 257 transient simulations of the coupled climate-carbon system, incorporating both parametric uncertainty (28 parameters) and forcing uncertainty (six modified Chebyshev coefficients). For coupling applications we require an emulator that will generate the annually resolved evolution of CO 2 concentration through time (2006 to 2105). The simulation outputs were combined into a (100 × 257) matrix Y, and SVD was performed on the matrix where L is the (100 × 257) matrix of left singular vectors ("component"), D is the 257 × 257 diagonal matrix of the square roots of the eigenvalues and R is the 257 × 257 matrix of right singular vectors ("component scores"). We retain the first four components, which together explain more than 99.9 % of the ensemble variance. Each individual simulated CO 2 concentration time series can thus be well approximated as a linear combination of the first four components, scaled by their respective scores. Each set of scores consists of a vector of coefficients, representing the projection of each simulation onto the respective component. As each simulated time series is a function of the input parameters, so are the coefficients that comprise the scores. So each component score can be viewed, and hence emulated, as a scalar function of the input parameters to the simulator.
Emulators of the first four component scores were derived as functions of the 28 model parameters and the six concentration profile coefficients. These emulators were built in R (R Development Core Team, 2013), using the stepAIC function (Venables and Ripley, 2002). For each emulator, we first built a linear model from all 34 inputs allowing only terms that satisfy the Bayes Information Criterion (BIC). BIC-constrained stepwise addition of quadratic and crossterms was then performed, allowing only inputs present in the linear model.
While the variance in emulator output is dominated by the Chebyshev forcing coefficients, uncertainty for a given forcing scenario is generated through emulator dependencies on GENIE-1 parameters. The most important of these is the CO 2 fertilization parameter, k 14 , describing the uncertain response of photosynthesis to changing CO 2 concentrations. To use the emulator, we constrain k 14 using the calibration of Holden et al. (2013a), to better quantify the uncertainty associated with the terrestrial sink. We evaluate the resulting emulated uncertainty through a comparison with C4MIP in Sect. 2.6.
We approximate the prior as a normal distribution with mean 500 ppm and standard deviation 150 ppm, following the base posterior of Holden et al. (2013a). We sampled values at random from this distribution and replaced the k 14 values in the 86-member training parameter set. Then, to generate a perturbed parameter ensemble of emulated futures, the emulation is performed for each of the resulting 86 parameter sets.

Validation of GENIEem
To validate the emulator, we apply leave-one-out crossvalidation, which involves rebuilding the emulator 257 times with a different simulation omitted and comparing the omitted simulation with its emulation. The proportion of variance V T explained by the emulator under cross-validation is given by where S (n,t) is the simulated CO 2 concentration at time t in left-out ensemble member n, E (n,t) the corresponding emulated output and S t is the ensemble mean output at time t. V T measures the degree to which individual emulations can be regarded as accurate  The cross-validated root mean square error of the emulator is given by RMSE = The proportion of variance explained by the emulator under cross-validation is found to be 96.8 %, and the crossvalidated root mean square error of the emulator is 34 ppm. The ensemble distribution of cross-validated emulator error does not exhibit any significant trends as a function of the forcing, being approximately distributed about zero, independently of the final CO 2 concentration. This suggests that the emulator errors are likely dominated by describing parametric uncertainty with low-order polynomials, and so would be randomly distributed across a perturbed parameter emulated ensemble. To test this we performed a simulation ensemble forced by RCP8.5. The simulated ensemble mean of 2100 CO 2 = 990 ± 92 ppm. This compares to the emulated ensemble mean of 975 ± 73 ppm with the same forcing. The R 2 value for emulated versus simulated output is 74.5 %. The emulator explains 74 % of the variance in 2100 CO 2 across the RCP8.5 simulation ensemble, demonstrating that the parametric uncertainty is reasonably well approximated.
Given that the RCP estimate is 936 ppm, these data appear to show that the emulator and simulator overstate the RCP8.5 concentration in the median. However, the reason for this is that this validation did not use the CO 2 fertilization prior, which is applied to the emulator to constrain the predictions.

Evaluation of GENIEem using RCPs
To further evaluate the emulator's performance, we consider GENIEem's response to forcing by Representative Concentration Pathways (RCPs; Van Vuuren et al., 2011). For each RCP, CO 2 emissions, non-CO 2 radiative forcing and CO 2 concentrations are provided by Meinshausen et al. (2011b) 3 . GENIEem is run using Chebyshev coefficients derived by fitting Eqs. (1) and (2) to RCP CO 2 emissions and non-CO 2 radiative forcing data. Emulated CO 2 concentrations are compared to the CO 2 concentrations corresponding to that RCP in the RCP Database. For RCP8.5, we also compare the emu-lator range with the CMIP5 ensemble range of CO 2 concentrations for that RCP.
GENIEem median CO 2 concentrations are generally well centred on the RCPs (Fig. 2). The RCP profiles were derived assuming carbon cycle rates that were calibrated to the median of the C4MIP models. This good agreement is therefore not imposed, but is desirable as it suggests that the ensemble of GENIE-1 parameter sets is not significantly biased with respect to C4MIP. The full range of 2105 emulated CO 2 concentrations under RCP8.5 forcing is 806 to 1076 ppm. When forced with the same RCP, 11 CMIP5 models simulate a range of 795 to 1145 ppm by 2100 (Friedlingstein et al., 2014), demonstrating that the emulator can reproduce existing estimates of the carbon cycle uncertainty. In a related analysis, the ensemble mean and variance were shown to be easier to emulate than individual simulations . The emulator's capacity to capture the CMIP5 simulation ensemble suggests that this is also the case here.
For RCP2.6, the difference between the RCP value and the emulator median reaches about 15 ppm. One possible explanation for this is the formulation of land use change. When land use is changed in GENIE, soil carbon evolves dynamically to a new equilibrium. Therefore, although the LUC mask is held fixed after the transient AD 850-2005 spinup, there are ongoing land-atmosphere fluxes in the future (2005-2105) due to historical LUC. Since the RCP emissions data used to force GENIEem already include the contribution from soil carbon fluxes, the inconsistency of approaches is liable to lead to a net additional forcing while the historical contribution decays. These residual emissions would be most significant in RCP2.6 because other emissions are lowest in this scenario, potentially contributing to the excess concentrations in the emulation of RCP2.6. This difference could be reduced by using a more sophisticated treatment of the forcing inputs that separated fossil fuel and land use carbon emissions, with land use emissions calculated from spatially explicit scenarios based on above-ground carbon change, as in Houghton (2008).

Application of GPem in an IAM framework
To demonstrate the utility of emulation within an integrated assessment framework, we describe how GENIEem, along with PLASIM-ENTSem, has been used to explore the climate change implications of four policy scenarios for the electricity sector, as presented in Mercure et al. (2014). GPem is coupled to FTT:Power-E3MG, which combines a technology diffusion model with a non-equilibrium economic model. Mercure et al. (2014) emphasizes the policy instruments that can be applied to decarbonization of the global energy sector, and analysis of climate impacts is limited to mean surface temperature anomalies. Here, we extend that work to illustrate the regional patterns of climate variability associated with different policy scenarios, and discuss these results in the context of "dangerous climate change" (Jarvis et al., 2012).

The climate model emulator: PLASIM-ENTSem
PLASIM-ENTSem is an emulator of the GCM PLASIM-ENTS; both simulator and emulator are described by Holden et al. (2014). The GCM consists of a climate model, PLASIM (Fraedrich, 2012), coupled to a simple surface and vegetation model, ENTS (Williamson et al., 2006), which represents vegetation and soil carbon through a single plant functional type. PLASIM has a heat-flux-corrected slab ocean and a mixed-layer of a given depth, and a 3-D dynamic atmosphere, run at T21 ∼ 5 • resolution. It utilizes primitive equations for vorticity, divergence, temperature and the logarithm of surface pressure, solved via the spectral transform method, and contains parametrizations for long-and short-wave radiation, interactive clouds, moist and dry convection, largescale precipitation, boundary layer fluxes of latent and sensible heat and vertical and horizontal diffusion. It accounts for water vapour, carbon dioxide and ozone.
As an emulator of PLASIM-ENTS, PLASIM-ENTSem emulates mean fields of change for surface air temperature and precipitation well, while emulations of precipitation underestimate simulated ensemble variability, explaining ∼ 60-80 % of the variance in precipitation (compared to ∼ 95 % for surface air temperature) .
The response of PLASIM-ENTSem to RCP forcing was analysed in Holden et al. (2014, Fig . 6); in all four scenarios, the emulated ensemble distribution was found to compare favourably with the multi-model CMIP5 ensemble.

Policy scenarios and emissions profiles
FTT:Power is a simulation model of the global power sector (Mercure, 2012), which has been coupled to a dynamic simulation model of the global economy, E3MG (Mercure et al., 2014)  within the electricity sector drive the uptake or phasing out of types of generators, leading to different CO 2 emission profiles (Fig. 3).
Here we consider four scenarios, a subset of the 10 scenarios explored in Mercure et al. (2014). Scenario i is the noclimate-policy baseline. The baseline scenario extends current policies in the energy sector to 2050. It assumes no additional technology subsidies worldwide, feed-in tariffs in some EU countries, and carbon pricing in the EU. Figure 3 illustrates that the emissions associated with this scenario are of a similar magnitude as emissions associated with RCP8.5, but following a more linear trajectory.
Scenario ii introduces carbon pricing, which rises to 200-400 2008 USD/tCO 2 . Scenario iii explores the use of carbon pricing, along with technology subsidies and feed-in tariffs in the developed world only. Finally, scenario iv uses carbon pricing, along with technology subsidies and feed-in tariffs to incentivize decarbonization, and also includes regulations to ban the construction of new coal power plants in China if not equipped with Carbon Capture and Storage; this policy set decarbonizes the global electricity sector by 90 % (relative to 1990 emissions) by 2050.

Coupling procedure
As FTT:Power-E3MG runs until 2050, emissions for 2050-2105 are estimated using a linear best-fit trend, except in the case of successful mitigation scenarios, where such an approach could lead to implausible emissions reductions by 2105. In these scenarios, the emissions in Pg C yr −1 reached in 2050 were assumed to remain constant beyond 2050 (i.e. in these scenarios, it is assumed that by 2050, the energy sector has decarbonized as much as can be incentivized under the specified policies).
A. M. Foley et al.: Integrated assessment using model emulation Chebyshev coefficients are calculated to provide leastsquares fits to each emissions profile produced by FTT:Power-E3MG. If we conservatively assume that any error in emissions due to differences between the FTT:Power-E3MG emissions profile and the corresponding Chebyshev curve has an infinite lifetime in the atmosphere, the accumulated error does not exceed 4.5 ppm in any scenario over the period 2005-2105, well within the 5th-95th percentiles of GENIEem.
As FTT:Power-E3MG does not simulate non-CO 2 radiative forcing, we select the RCP that best matches the CO 2 concentrations associated with the baseline scenario (RCP8.5) and force GENIEem with the non-CO 2 radiative forcing associated with that RCP. The RCP8.5 non-CO 2 radiative forcing was applied to all scenarios as the RCPs lack a suitable analog to the CO 2 concentrations associated with the power sector mitigation scenarios examined in this work. Values for Chebyshev coefficients are calculated and these three coefficients, together with the three CO 2 emissions coefficients, are the inputs to GENIEem.
This approach maintains comparability across the different scenarios, although we expect some small reductions in CH 4 and N 2 O in the mitigation scenarios, due to a reduction in leaks of these GHGs from drilling. Representations of these GHGs in E3MG-FTT are not sufficiently detailed to provide forcing data for GPem, but reductions in fuel-userelated CH 4 and N 2 O emissions of around 10-15 % by 2050 in the mitigation scenarios can be inferred. After 2050, we expect a stabilization at this new level, as the sectors involved have decarbonized by 90 %, producing a reduction in forcing of roughly 0.1 Wm −2 (relative to total forcing of 7.3 to 8.3 W m −2 in the baseline and 5.3 to 6.2 W m −2 in the mitigation scenario, accounting for carbon cycle uncertainty). This small reduction in forcing is well within the uncertainty bounds of GENIEem.
Climate-carbon feedbacks are emulated entirely within GENIEem. No climate information is passed from PLASIM-ENTSem to GENIEem. PLASIM-ENTSem takes inputs of both actual CO 2 (for CO 2 fertilization) and equivalent CO 2 (for radiative forcing). Chebyshev coefficients are calculated to provide least-squares fits to the median and 5th-95th percentiles of the GENIEem ensemble CO 2 concentrations; these coefficients, therefore, correspond to actual CO 2 concentrations. Chebyshev coefficients for equivalent CO 2 are also calculated, corresponding to combined CO 2 and non-CO 2 forcings. To determine these coefficients for equivalent CO 2 , the median and 5th-95th percentiles of the GENIEem ensemble CO 2 concentrations are converted to radiative forcing following F = 5.35 ln (CO 2 /280) W m −2 .
RCP8.5 non-CO 2 forcing is added to this time series to give total radiative forcing, which is converted to equivalent CO 2 using the previous relationship. Chebyshev coefficients for equivalent CO 2 are fitted to the resulting time series.
Thus, PLASIM-ENTSem is forced with three sets of six coefficients (three actual CO 2 and three equivalent CO 2 each for the median and 5th-95th percentiles of the GENIEem ensemble).
We calculate the median warming of the PLASIM-ENTSem ensemble based on the 5th and 95th percentiles of the GENIEem ensemble. These bounds, therefore, illustrate parametric uncertainty of the carbon cycle model alone.
We also calculate the median and 5th-95th percentiles of warming of the PLASIM-ENTSem ensemble from the median GENIEem ensemble output. These bounds reflect parametric uncertainty in the climate model alone.
Finally, we calculate the 5th percentile of warming from the PLASIM-ENTSem ensemble based on the 5th percentile of CO 2 concentration from the GENIEem ensemble, and the 95th percentile of warming from the PLASIM-ENTSem ensemble based on the 95th percentile of CO 2 concentration from the GENIEem ensemble. This third set of bounds reflects warming uncertainty due to parametric uncertainty in the climate model and the carbon cycle model, computed under the assumption that GENIEem and PLASIM-ENTSem projections are perfectly correlated, i.e. that states exhibiting the greatest CO 2 concentration in GENIEem correspond to states exhibiting greatest warming in PLASIM-ENTSem. Many carbon cycle processes are affected directly by changes in temperature, or by variables which covary with temperature (Willeit et al., 2014), so while such a correlation is not absolute, there is a motivation for this approach.

GPem mean warming under policy scenarios
We applied GPem to determine the atmospheric CO 2 concentrations and mean global temperature anomalies associated with different mitigation policies applicable to the energy sector. While the mitigation policies explored generate reductions in CO 2 emissions from the energy sector, due to the effect of non-CO 2 radiative forcing on climate, combined with remaining CO 2 emissions, CO 2 concentrations continue to increase in mitigation scenarios (Fig. 4). Figure 4 also illustrates the temperature anomalies associated with each of the scenarios. Modelled anomalies are relative to the model baseline, 1995-2005. Therefore historical warming, estimated at ≈ 0.6 • C in 2000 (IPCC, 2013), is added to give anomalies relative to the preindustrial period. While there is no scenario in which temperature stabilizes by 2100, in scenario iv, the rate of warming remains roughly constant, while in scenario i, the rate of warming appears to increase towards the later half of this century. The effect of cascading uncertainty is apparent (Jones, 2000;Foley, 2010), leading to large uncertainty bounds for temperature projections. Top panels: median CO 2 concentrations for scenarios i (baseline), ii, iii and iv, simulated by GENIEem, with uncertainty bounds (GENIEem 5th/95th percentile). Bottom panels: median temperature anomalies relative to preindustrial conditions for scenarios a (baseline), d, i and j, simulated by PLASIM-ENTSem using median GENIEem CO 2 concentrations. Uncertainty bounds are based on carbon cycle uncertainty (PLASIMem median with GENIEem 5th/95th percentile), climate uncertainty (PLASIMem 5th/95th percentile with GENIEem median), and combined uncertainty (PLASIMem 5th/95th percentile with GENIEem 5th/95th percentile). The 2 • C target, described as "the maximum allowable warming to avoid dangerous anthropogenic interference in the climate" (e.g Randalls, 2010), is also illustrated by the grey dashed line.
4.2 GPem regional climate under policy scenarios Figure 5 illustrates the 2095-2105 December-February and June-August warming anomalies associated with scenario i and iv, presenting the median and 5th/95th percentiles of the PLASIM-ENTSem ensemble outputs calculated independently at each grid point. These emulated ensembles are forced with GENIEem median CO 2 concentrations for the respective scenario, giving an indication of the range of PLASIM-ENTSem parametric uncertainty associated with the projection. It is evident that the warming associated with the baseline scenario would be partially offset under the mitigation scenario. However, certain hotspots of warming are apparent even under the 5th percentile projection. In both scenarios, there is cooling in Southeast Asia in summer, which likely arises due to a strengthening of the monsoon in PLASIM-ENTSem. However, Holden et al. (2014) note that this signal may not be robust as the model lacks aerosol forcing. Figure 6 illustrates the mean 2095-2105 December-February and June-August precipitation patterns associated with scenarios i and iv, along with the proportion of the 86 ensemble members simulating increased precipitation in each case. Generally, areas that experience a significant increase/decrease in precipitation under scenario iv (i.e. larger than ±1 mm day −1 ) experience even greater extremes under scenario i, which can be attributed to differences in water vapour amount in the atmosphere due to warming (Held and Soden, 2006); precipitation fields are amplified as more water is available in the convergence zones to condense. Plotting the proportion of ensemble members that project increasing precipitation shows that in most regions of the world, there is high agreement between ensemble members on the direction of change for precipitation.
Precipitation patterns are similar for the two scenarios presented (r = 0.99), suggesting that a simple pattern scaling approach would have sufficed in the particular example considered here, at least for estimation of the ensemble mean field. However, Tebaldi and Arblaster (2014) considered correlations between the averaged precipitation anomaly fields  of the CMIP5 multi-model ensemble when forced with different RCPs; the lowest correlation (0.85) was between ensembles forced with RCP2.6 and RCP8.5, while a correlation of 0.97 was found between RCP4.5 and RCP8.5. Applying our emulation framework yielded correlations of 0.89-0.93 (RCP2.6, RCP8.5) and 0.97-0.98 (RCP4.5, RCP8.5), depending on season. This comparison suggests that the emulation framework captures non-linear feedback strengths that are comparable to those found in a high-complexity high-resolution multi-model ensemble and, furthermore, that the assumptions of pattern scaling may not be optimal when applied to strong mitigation scenarios.

Conclusions
We have described and validated a new carbon cycle model emulator, GENIEem, and applied it along with PLASIM-ENTSem to demonstrate the utility of statistical model emulation in an IAM setting. The climate-carbon-cycle emulator GPem was used to examine atmospheric CO 2 concentration, mean global temperature anomalies, and spatial temperature and precipitation response patterns resulting from CO 2 emission scenarios associated with various mitigation scenarios for the electricity sector. Even the most successful mitigation strategy considered here results in warming of above 3.5 • C by 2100, a level of warming which Parry et al. (2009) notes could result in substantial harmful impacts, including risks of water shortage and coastal flooding. As such, in a context where the global electricity sector is decarbonized by 90 %, further emissions reductions must be achieved in other sectors (e.g. transport and industry) to enable CO 2 concentrations to remain below 450 ppm, and correspondingly, global warming below 2 • C (Meinshausen et al., 2009). The latest IPCC AR5 notes that in 2010, the energy supply sector accounted for 35 % of total GHG emissions (IPCC, 2014), therefore there is scope for reductions to be achieved in other sectors. For instance, policy options explored by Luderer et al. (2012) which keep CO 2 concentrations below 450 ppm, using the IMACLIM-R and ReMIND-R models, include mitigation in the transportation sector to reduce energy demand. However, the IPCC AR5 notes that based on scenario analysis, sectors currently using liquid fuel may be more costly, and therefore slower, to decarbonize than electricity. Additionally, it is worth noting that the most successful mitigation scenarios explored in the IPCC AR5, which lead to CO 2 equivalent concentrations in the range of 430-480 ppm by 2100 (approximately equivalent to RCP2.6) feature large-scale, long-term application of carbon dioxide removal (CDR) technologies, in addition to large emissions reductions (IPCC, 2014). This analysis, focusing on the effectiveness of mitigation policies in the electricity sector, therefore highlights the danger of focusing mitigation efforts on this single sector, where the cost of decarbonization is lower; not only are such efforts insufficient to maintain global warming below 2 • C, but additionally, the heteroge-neous distribution of climate impacts globally will need to be addressed.
Furthermore, the inadequacy of the electricity sector to solve the emissions problem is in spite of the fact that the inclusion of non-linear feedbacks on technology uptake is expected to promote decarbonization in our model, compared to the equilibrium models in the IPCC AR5 database, which may not capture the complexities of real-world human behaviour in mitigation decision making (Mercure et al., 2015).
The 2 • C warming threshold is often a focal point of climate mitigation policy and scholarship, and is indeed useful as a guiding principle (e.g. Den Elzen and Meinshausen, 2006;Oberthür and Roche Kelly, 2008;Shindell et al., 2012). However, it is also vital to consider the complex temperature and precipitation patterns that could occur, lest a focus on the global mean temperature results in regional climate impacts being overlooked. Furthermore, consideration must be given to how to adapt to diverse regional climate change, should this target not be met (Parry et al., 2009). Applying the GPem framework yields a more systematic representation of uncertainty in future regional climate states, when compared with the pattern-scaling approaches that are based on "ensembles of opportunity" (Stone et al., 2007). While uncertainties associated with carbon cycle and climate modelling in this framework are accounted for through the use of ensembles, it is still possible that the actual future climate state may fall outside the simulated range. Uncertainties associated with emissions profiles are more difficult to quantify as these depend, ultimately, on human decision making. Therefore many policy contexts should be modelled in order to find out which ones effectively lead to desired outcomes.
The Supplement related to this article is available online at doi:10.5194/esd-7-119-2016-supplement.