University of Birmingham Current challenges of implementing anthropogenic land-use and land-cover change in models contributing to climate change assessments

Land-use and land-cover change (LULCC) represents one of the key drivers of global environmental change. However, the processes and drivers of anthropogenic land-use activity are still overly simplistically implemented in terrestrial biosphere models (TBMs). The published results of these models are used in major assessments of processes and impacts of global environmental change, such as the reports of the Intergovernmental Panel on Climate Change (IPCC). Fully coupled models of climate, land use and biogeochemical cycles to explore land use–climate interactions across spatial scales are currently not available. Instead, information on land use is provided as exogenous data from the land-use change modules of integrated assessment models (IAMs) to TBMs. In this article, we discuss, based on literature review and illustrative analysis of empirical and modeled LULCC data, three major challenges of this current LULCC representation and their implications for land use– climate interaction studies: (I) provision of consistent, harmonized, land-use time series spanning from historical reconstructions to future projections while accounting for uncertainties associated with different land-use modeling approaches, (II) accounting for sub-grid processes and bidirectional changes (gross changes) across spatial scales, and (III) the allocation strategy of independent land-use data at the grid cell level in TBMs. We discuss the factors that hamper the development of improved land-use representation, which sufficiently accounts for uncertainties in the land-use modeling process. We propose that LULCC data-provider and user communities should engage in the joint development and evaluation of enhanced LULCC time series, which account for the diversity of LULCC modeling and increasingly include empirically based information about sub-grid processes and land-use transition trajectories, to improve the representation of land use in TBMs. Moreover, we suggest concentrating on the development of integrated modeling frameworks that may provide further understanding of possible land–climate–society feedbacks. Published by Copernicus Publications on behalf of the European Geosciences Union. 370 R. Prestele et al.: Current challenges of implementing anthropogenic land-use and land-cover change


Introduction
Anthropogenic land-use and land-cover change (LULCC; for a list of abbreviations used in the paper see Supplement Sect. S0) is a key cause of alterations in the land surface (Ellis, 2011;Ellis et al., 2013;Turner et al., 2007), with manifold impacts on biogeochemical and biophysical processes that influence climate (Arneth et al., 2010;Brovkin et al., 2004;Mahmood et al., 2014;McGuire et al., 2001;Sitch et al., 2005) and affect food security (Hanjra and Qureshi, 2010;Verburg et al., 2013), freshwater availability and quality (Scanlon et al., 2007), and biodiversity (Newbold et al., 2015). Hence, LULCC is now being increasingly included in terrestrial biosphere models (TBMs), including dynamic global vegetation models (DGVMs) and land surface models (LSMs) (Fisher et al., 2014), to quantify historical and future climate impacts both in terms of biophysical (surface energy and water balance) and biogeochemical variables (carbon and nutrient cycles) (Le Quéré et al., 2015;Luyssaert et al., 2014;Mahmood et al., 2014). For example, LULCC has been estimated to act as a strong carbon source since preindustrial times (Houghton et al., 2012;Le Quéré et al., 2015;McGuire et al., 2001). Livestock husbandry, rice cultivation, and the large-scale application of agricultural fertilizers further contributed to the increase in atmospheric CH 4 and N 2 O concentration (Davidson, 2009;Zaehle et al., 2011), turning the land into a potential net source of greenhouse gases to the atmosphere (Tian et al., 2016). Local and regional observational studies suggest impacts of LULCC on biophysical surface properties, e.g., surface albedo and water exchange, eventually affecting temperature and precipitation patterns (Alkama and Cescatti, 2016;Pielke et al., 2011).
TBMs were originally designed to study the interactions between natural ecosystems, biogeochemical cycles, and the atmosphere. The short history of implementing landuse change in TBMs (∼ 10 years; Canadell et al., 2007), along with the need to include external data (e.g., maps of global cropland or pasture distribution) to represent land-use change, has led to several issues that complicate the quantification of land-use change impacts on climate and biogeochemical cycles using TBMs. For example, carbon fluxes related to land-use change that increase the atmospheric concentration of greenhouse gases are the largest source of uncertainty in the global carbon budget (Ballantyne et al., 2015;Le Quéré et al., 2015). Similarly, biophysical impacts of land-use change on climate are not yet sufficiently understood and quantified (Pielke et al., 2011). The lack of process understanding and reliable quantification of impacts can be attributed to a separated history of land-use research and land-cover research and the current offline coupling of different models, where external land-use information from integrated assessment models (IAMs) or dedicated land-use change models (LUCMs) is imposed on the natural vegeta-tion scheme of TBMs. This current land-use representation is sensitive to, in addition to other factors, the definition of individual land-use categories (e.g., what exactly defines a pasture), inconsistencies in the definition of the land-use carbon flux Stocker and Joos, 2015), the implementation and parameterization of land use in TBMs de Noblet-Ducoudré et al., 2012;Di Vittorio et al., 2014;Hibbard et al., 2010;Jones et al., 2013;Pitman et al., 2009;Pugh et al., 2015), the structural differences across IAMs and LUCMs (Alexander et al., 2017;Prestele et al., 2016;Schmitz et al., 2014), and the uncertainty about land-use history (Ellis et al., 2013;Klein Goldewijk and Verburg, 2013;Meiyappan and Jain, 2012).
Currently reported uncertainties of the outputs of land useclimate interaction studies may be underestimated by insufficiently accounting for the aforementioned sources of uncertainty. The current land-use representation therefore requires improvement to narrow down the uncertainty range in reported results of land use-climate studies and eventually increase the confidence level of climate change assessments. Assessments of the global water cycle, freshwater quality, biodiversity, and non-CO 2 greenhouse gases would also benefit from an improved land-use representation.
The overall objective of this article is to review three important challenges faced in connecting models to assess land use-climate interactions and feedbacks, discuss the underlying mechanisms and constraints that have hampered improved representations until now, and propose pathways to improve the land-use representation. We review recent literature from the land use, land cover, carbon cycle, and climate modeling communities and support our arguments using illustrative analysis of satellite land-cover products and outputs of the land-use change model CLUMondo (Van Asselen and Verburg, 2013). Each of the following sections presents one of the three challenges we identify to be crucial in future land use-climate interaction studies and reviews the issue and its implications for the results of modeling studies, based on previously published literature and in the context of the widely applied Land-Use Harmonization (LUH) dataset published by Hurtt et al. (2011). In Sect. 5 we propose pathways to improve the current LULCC representation for each of the challenges and conclude with an outlook on future research priorities.
2 Challenge I: spatially explicit, continuous, and consistent time series of land-use change

Background and emergence
Current TBMs require consistent, continuous, and spatially explicit time series of land-use change, covering at least the period since the industrial revolution (∼ 1750) to disentangle the contributions of land use and fossil fuel combustion to carbon cycling and radiative forcing (Le Quéré et al., 2015;Shevliakova et al., 2009). Without time series of at least this length, important legacy fluxes will be missed in the calculations. The application of discontinuous land-use change time series in TBMs to quantify the interactions and feedbacks between land use and climate would lead to large artificially induced changes ("jumps") in land use. Corresponding jumps in carbon and nutrient pools in the transition period would distort legacy fluxes working on decadal to centennial timescales, rendering the simulations useless for the quantification of climate impacts. However, observational data on LULCC are not available on the global scale with the required temporal and spatial resolution, consistency, and historical coverage . Instead, models are utilized to represent global land use and produce the required land-use change time series. Land-use modeling is typically split up into historical backcasting approaches and future scenario modeling. Both forward-and backward-looking models apply a range of different modeling approaches as well as different assumptions about drivers and the spatial allocation of landuse changes (National Research Council, 2014;Yang et al., 2014), and they are often initialized with different representations of present-day land use (Prestele et al., 2016). Thus, even the models within one community (future or historical) do not provide consistent information on land use and landuse change over time, and a variety of independent datasets on a spatially explicit or world regional level are provided to the user community (e.g., climate modeling) (see Supplement Sect. S1 and Table S1 for examples of the historical data). These historical and future datasets are not connected and consistent in the transition period and entail a variety of uncertainties (Klein Goldewijk and Verburg, 2013) ( Fig. 1). In consequence, these datasets disagree about the amount and the spatial pattern of land affected by human activity. Moreover, varying detail in classification systems, inconsistent definition of individual categories (e.g., forest or pasture), and individual model aggregation techniques, amplify the discrepancies among models (Alexander et al., 2017;Prestele et al., 2016 lack of comprehensive documentation of the updated version at the time this paper was written and as, to our best knowledge, the points we demonstrate using LUH will still be valid with the new product, we primarily refer to the CMIP5 version in the remainder of this paper. Hurtt et al. (2011) extended their Global Land-use Model (GLM; Hurtt et al., 2006) to produce a consistent time series of land-use states (fraction of each land-use category in a grid cell) and transitions (changes between land-use categories in a grid cell) for the time period 1500-2100. The cropland, pasture, and wood harvest projections of four IAMs were smoothly connected to the History Database of the Global Environment (HYDE) historical reconstruction of agricultural land use (Klein Goldewijk et al., 2011) and historical wood harvest estimates by applying the decadal spatial patterns from the projections onto the HYDE map of 2005 ( Fig. 1). This harmonization process tries to conserve the original patterns, rate, and location of change as much as possible and to reduce the differences between the models due to definition of cropland, pasture, and wood harvest. To achieve the final harmonized time series and explicit transitions, the preprocessed land-use time series are used as input into the GLM model and constrained by further data and assumptions about the occurrence of shifting cultivation, the spatial pattern of wood harvest, priority of the source of agricultural land, and biomass density . The harmonization ensured, for the first time, consistent land-use input for climate model intercomparisons and thus facilitated the implementation of anthropogenic impact on the land in climate models. Beyond this inarguable success, several uncertainties are to date not, or only partially, addressed in the LUH data. In the following section we discuss the main uncertainties and how they may propagate into TBMs, impacting the amplitude and possibly even the sign of land-use interactions and feedbacks.

Open issues in the LUH data and their implications
for climate change assessments The first major uncertainty of the LUH data evolves from the exclusive consideration of the HYDE baseline dataset for the historical period. The HYDE reconstruction is erroneously regarded as observational data rather than as model output accompanied by various sources of uncertainty (Klein Goldewijk and . Importantly, the LUH2 data will additionally include the HYDE low and high estimates of land use for the historical period (Lawrence et al., 2016). However, alternative spatially explicit reconstructions have been proposed Pongratz et al., 2008;Ramankutty and Foley, 1999) (see Supplement Sect. S1 and Table S1 for additional information on these reconstructions), and have been shown to differ substantially in terms of both the total cultivated area and spatial pattern over time (Meiyappan and Jain, 2012). These differences originate in the scarcity of historical input data (i.e., mainly population estimates) for historical times, the assumption about the functional relationship between population density and land use (e.g., linear or nonlinear), and the allocation scheme used to distribute regional or national estimates of agricultural land to specific grid cell locations (Klein Goldewijk and Verburg, 2013). The uncertainty about land-use history has several implications for land use-climate interactions (Brovkin et al., 2004). For instance, Meiyappan et al. (2015) found the difference in cumulative land-use emissions among three historical reconstructions for the 21st century modeled by one TBM to be about 18 PgC or ∼ 11 % of the mean land-use emission. Another study, using three commonly used net land-use datasets in one TBM, revealed differences of about 20 PgC or ∼ 9 % of the mean land-use emission since 1750 . Jain et al. (2013) further found contrasting trends in land-use emissions on a regional scale during the past 3 decades, which originate in different amounts and rates of land-use change in different realizations of historical land use. Furthermore, as biophysical climate impacts of land use are known to be substantial, especially on a regional scale (Alkama and Cescatti, 2016;Pielke et al., 2011;Pitman et al., 2009), an inappropriate representation of the uncertainty about land-use history is likely to affect model outcomes regarding changes in local to regional climate. Using the HYDE reconstruction exclusively implies high confidence about land-use history in many large-scale assessments and comparison studies (Kumar et al., 2013;Le Quéré et al., 2015;Pitman et al., 2009); this confidence is in fact lacking. As a result, important uncertainties are excluded from climate change mitigation and adaptation policies developed based on these studies (Mahmood et al., 2016).
Second, large inconsistencies exist between estimates of present-day land use. The LUH approach does not consider the differences between different data regarding the current state of land use as it connects the future projections exclusively to the HYDE end map (Fig. 1). The present-day starting maps of historical reconstructions and future projections are based on maps derived from the integration of remotely sensed land-cover maps and (sub-)national statistics of land use (e.g., Erb et al., 2007;Fritz et al., 2015;Klein Goldewijk et al., 2011;Ramankutty et al., 2008). The land-cover maps in turn disagree about extent and spatial pattern of agricultural land (Congalton et al., 2014;Fritz et al., 2011) due to both inconsistent definitions of individual land-use and landcover categories (e.g., Sexton et al., 2015) and difficulties in identifying them from the spectral response (Friedl et al., 2010). These differences propagate into the starting maps of the various land-use change models, including the IAMs providing data for the LUH (Prestele et al., 2016). Removing these differences can result in substantial deviations of the seasonal and spatial pattern of surface albedo, net radiation, and partitioning of latent and sensible heat flux (Feddema et al., 2005) and can affect carbon flux estimates proposed by TBMs across spatial scales (Quaife et al., 2008).
Finally, the future projections used in the LUH are provided by different IAMs, whereby each of them represents an individual scenario of the four representative concentration pathways (RCPs) in CMIP5 or the five shared socioeconomic pathways (SSPs) in CMIP6 . These are referred to as "marker scenarios" in the case of the SSPs. A marker scenario entails the implementation of a SSP by one IAM that was elected to represent the characteristics of the qualitative SSP storyline best, while additional implementations of the same SSP in other IAMs are "non-marker scenarios" Riahi et al., 2017). Alternative RCP or SSP implementations were not considered in LUH. Land-use change model intercomparisons and sensitivity studies, however, indicate that the uncertainty range emerging from different assumptions in the models, input data, and spatial configuration substantially impacts the model results (Alexander et al., 2017;Di Vittorio et al., 2016;Schmitz et al., 2014). Due to the large range across model outcomes per scenario, the problems of using marker scenarios from different models are evident. However, no better alternative to this approach seems to be currently available, and representing uncertainty across models is valuable . Model comparisons further revealed that while land-use change models represent the future development of cropland area more consistently, the representation of pastures and forests (if modeled) is poor. For example, the projections of 11 IAMs and LUCMs show large variations in pasture areas in 2030 for many world regions (Fig. 2, background map; Supplement Sect. S2.1). These projections were based on a wide range of scenarios, and thus  (2015) for the year 2010), model-related variation (model type and spatial configuration), and scenario-related variation to the total variation in a region. The right bar plots show the relative contribution (as a percentage) of variance components to the part of total variation that cannot be attributed to initial variation. The figure is based on 11 regional and spatially explicit land-use change models as described in Prestele et al. (2016). Methodological details can be found in Supplement Sect. S2.1 (Table S2) and in Alexander et al. (2017). variation in outcomes was to be expected (Prestele et al., 2016). The variation attributed to the difference in model structure exceeds the variation due to different scenarios in most regions (Fig. 2, bar plots), while the main part of the variation relates to the different starting points of the models, i.e., deviation from FAO pasture areas in the year 2010. This implies that in many cases the different land-use projections actually do not represent different outcomes resulting from different scenario assumptions, but rather differences between land-use data input used to calibrate the models and the implementation of drivers and processes in the models. Consequently, differences in future climate impacts of land use are likely also affected by the structural differences across land-use change models.

Background and emergence
Typically, net land-use changes are applied in TBMs. Net land-use changes refer to the summed grid cell difference in land-use categories between two subsequent time steps at a certain spatial and temporal resolution. Gross change representations provide additional information about land-use changes on a sub-grid scale. The total area in a grid cell that has been affected by change can be calculated by the sum of all individual changes (i.e., area gains and area losses). Gross changes have been shown to be substantially larger than net changes due to bidirectional change processes happening at the same time step (Fuchs et al., 2015a;Hurtt et al., 2011) that are obscured in net change representations. For example, 20 km 2 cropland at time t 1 and 40 km 2 at time t 2 within a grid cell does not necessarily mean that this change resulted from clearing exactly 20 km 2 of forest. Equally plausible would be clearance of forest of larger spatial extent, while at the same time also abandoning a certain amount of cropland, resulting in the same net areal change.
Gross changes are not consistently defined across communities. Commonly, shifting cultivation (mostly occurring today in parts of the tropics) and cropland-grassland dynamics (i.e., the bidirectional process of cropland expansion and abandonment) are referred to as gross changes (Fuchs et al., 2015a;Hurtt et al., 2011). Moreover, in the carbon cycle and climate modeling communities, wood harvest (in addition to forest cleared for agricultural land) is sometimes included in gross changes Stocker et al., 2014;Wilkenskjeld et al., 2014). A more general definition would include all area changes (i.e., gains and losses across all categories represented in a product) that are not depicted in land-use change products (Fuchs et al., 2015a). The larger the averaging unit (be it in terms of grid cell or time), the greater the discrepancy between gross and net changes becomes. Re-gridding of high-resolution (e.g., 5 arcmin) landuse information to the TBM grid (∼ 0.5 • ) thus entails additional loss of information on land-use transitions unless gross changes are considered.
These sub-grid dynamics have been shown to be of importance when modeling change of carbon and nutrient stocks in response to land-use change in recent TBM studies Fuchs et al., 2016;Stocker et al., 2014;Wilkenskjeld et al., 2014). For example, Bayer et al. (2017) found the global cumulative land-use carbon emission to be ∼ 33 % higher over the time period 1700-2014. Stocker et al. (2014) likewise report increased carbon emissions in recent decades and for all RCPs when accounting for shifting cultivation and wood harvest. Similarly, Wilkenskjeld et al. (2014) found a 60 % increase in the annual land-use emission for the historical period  and a range of 16-34 % increase for future scenarios, when accounting for gross changes. Recently, Arneth et al. (2017) demonstrated uniformly larger historical land-use change carbon emissions across a range of TBMs when shifting cultivation and wood harvest were included, which has implications for understanding of the terrestrial carbon budget as well as for estimates of future carbon mitigation potential in regrowing forest.
Except for such sensitivity studies, gross changes have hardly been considered so far in land use-climate interaction studies (a notable exception being , mainly due to two reasons. First, gross change estimates have not been available until recently. Deriving estimates of historical and future gross change is a difficult task since gross changes vary with spatial and temporal scale (Fuchs et al., 2015a), i.e., they are dependent on the scale of the underlying net change product used for modeling and to what extent gross change processes are included in the individual land-use change models. Second, the implementation of bidirectional changes below the native model grid often entails substantial technical modification to TBM structure, meaning that many TBMs are currently not ready to include information on gross changes or only started recently to include it.
3.2 Example: gross changes due to re-gridding in the CLUMondo model To illustrate the amount of land-use and land-cover change that might be missed in net representations, we conducted an analysis based on the output of a dedicated high-resolution LUCM (CLUMondo; 5 arcmin spatial resolution; Eitelberg et al., 2016;Van Asselen and Verburg, 2013). We tracked all changes between five land-use and land-cover categories (cropland, pasture, forest, urban, and bare) at the original resolution over the time period from 2000 to 2040. Aggregating to ca. 0.5 • resolution allowed the differentiation of the gross area from the net area affected by change (see Supplement Sect. S2.2 for methodological details). The results, shown in Fig. 3, indicate that gross changes are substantially higher than net changes all over the globe, including the temperate zone and high latitudes. It has to be noted that Fig. 3 is only based on one realization of a single LUCM, i.e., not necessarily representing the full extent and spatial pattern of global-scale gross changes. The analysis only depicts the loss of information while re-gridding from 5 arcmin to 0.5 • resolution. Thus, bidirectional changes below the spatial resolution of the original data are still not captured.

Current approaches to providing gross change information: LUH and analysis of empirical data
To provide estimates of gross change, the land-use change modeling community currently follows two different approaches. First, Hurtt et al. (2011), within the framework of LUH, propose a matrix that provides explicit transitions between cropland, pasture, urban, and natural vegetation. Subgrid-scale information is added to net transitions (that are derived from historical or projected land-use data and referred to as "minimum transitions") through assumptions about the extent of shifting cultivation practices and the spatial pattern of wood harvest. In each grid cell, where shifting cultivation appears according to a map of Butler (1980), an average land-abandonment rate is added to each transition from and to agricultural land. In LUH2 an updated shifting-cultivation estimate based on the analysis of Landsat imagery will be included and replace the aforementioned simple assumption (Lawrence et al., 2016). Wood harvest is regarded as gross change, if the wood harvest demand from statistics (historical) or IAMs (future) is not met by deforestation for agricultural land in the net transitions or the GLM model is run in a configuration where deforestation for agricultural land is not counted towards wood harvest demand. The second approach derives gross / net ratios and a transition matrix directly from empirical data such as historical maps or high-resolution remote sensing products. These ratios can subsequently be applied to existing historical or future net representations to provide estimates of additional area affected by change (Fuchs et al., 2015a).

Open issues in the current approaches
The LUH gross transitions account for some aspects of gross changes. However, the values are dependent on what one includes in the definition of gross changes and are based on overly simplistic assumptions. Most of the gross transitions appear in parts of the tropics, where shifting cultivation is assumed to be an important agricultural practice their Fig. S1). Gross changes outside of these areas are mainly related to wood harvest, i.e., the (additional) area deforested to meet external wood harvest demands. Although these are regarded as gross changes in some literature (e.g., Hurtt et al., 2011;Stocker et al., 2014), we argue that wood harvest not leading to an actual areal change of land cover (e.g., forest to cropland) should be referred to as land management rather than gross change. Excluding wood harvest from the LUH data restricts the occurrence of gross changes to the areas of shifting cultivation. However, our analysis of CLUMondo output (Fig. 3), along with the European analysis of Fuchs et al. (2015a), suggests substantial amounts of gross change (below the 0.5 • LUH grid) also in the temperate zone and the high latitudes. Consequently, the LUH approach heavily depends on the resolution of the original landuse data (provided by IAMs or historical reconstructions) and their ability to represent land-use change dynamics on a subgrid scale. The data-based approach avoids the process uncertainty that hinders high-resolution model projections of land use, but is limited to the time period where empirical data through remote sensing is available. Additional sources such as historical land-use and land-cover maps and statistics (Fuchs et al., 2015b) may contribute to covering longer time periods, although with limited spatiotemporal resolution and spatial coverage, and an associated increase in uncertainty. It is thus difficult to develop multi-century reconstructions or future scenarios including gross changes using data-based approaches since the derived gross / net ratios are only valid for periods of data coverage and are expected to change over time (Fuchs et al., 2015a).

Background and emergence
The LSMs in most Earth system models (ESMs) in CMIP5 treated the land surface as a static representation of current land-use and land-cover distribution typically derived from remote sensing products de Noblet-Ducoudré et al., 2012). DGVMs, some of which are incorporated in the land surface component of ESMs, were originally designed to model potential natural vegetation as a dynamic function of monthly climatology, bioclimatic limits, soil type, and the competitiveness of different wood-or grass-shaped plant functional types (PFTs) (Prentice et al., 2007). Thus, the early TBMs were not able to sufficiently account for anthropogenic activity on the land surface and consequently the impact of land use on climate and biogeochemical cycles (Flato et al., 2013). However, over the last decade, representation of human land-cover change and also some land-management aspects have increasingly been added to these models, albeit with levels of complexity that vary from crops as grassland to more detailed agricultural representations Le Quéré et al., 2015;Lindeskog et al., 2013). Crop functional types (CFTs) and management options have been introduced in some models, explicitly parameterizing the phenology and biophysical and biogeochemical characteristics of major crop types and distinguishing important management options such as irrigation, fertilizer application, occurrence of multiple cropping, or processing of crop residues Lindeskog et al., 2013). However, since TBMs do not include representations of human activity as a driver of changes in the land surface, information about the extent and exact location of managed land is required from external data sources such as IAMs or LUCMs.
IAMs and LUCMs usually provide land-cover information (e.g., forest, grassland, and shrubland) along with land-use information (e.g., cropland and pasture). However, as modeling changes in natural vegetation type is one of the primary functions of many TBMs, only land-use information has been used in the LUH . Hence, TBM modelers have to decide in which way the natural vegetation in a grid cell has to be reduced (in case of expansion of managed land) or increased (in case of abandonment of managed land). This has resulted in a range of different strategies, which we show as an illustration in Table 1 for a nonexhaustive list of models. The decision is important as it im-  Bondeau et al. (2007) pacts the distribution of the natural vegetation in a grid cell, as well as the mean length of time that land has been under a particular use, with consequences for both the biogeochemical and biophysical properties . For example, new cropland expanding into forest would lead to a large and relatively rapid loss of ecosystem carbon due to deforestation, while cropland expanding into former grassland would have a less immediate impact on ecosystem carbon stocks due to the long time lag (years to centuries) for the resulting changes in soil carbon to be realized (Pugh et al., 2015). Likewise, the albedo and partitioning of energy differs strongly between forest and grassland land covers (Mahmood et al., 2014;Pielke et al., 2011). In the following sections we illustrate, based on literature review and analysis of empirical and modeled data, that the previously described simple allocation algorithms, applied globally within TBMs, do not account well for the spatiotemporal variation in land-use and land-cover change. Table 2 summarizes dominant sources of cropland expansion for several world regions and demonstrates the heterogeneity in the spatial pattern of expanding agriculture. For Europe, the CORINE land-cover product (Bossard et al., 2000) indicates over two consecutive time periods (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2000)(2001)(2002)(2003)(2004)(2005)(2006) shrubland systems to be the main source of expanding agricultural land, followed by low-productivity grasslands and forests (Fig. 4a). In contrast, over a similar time period, the NLCD (Homer et al., 2015) for the USA shows low-productivity grasslands as the dominant source of new croplands, while pastures are predominantly converted from forest or shrubland systems and grasslands only account for around 20 % of new pastures (Fig. 4b). A large-scale study by Graesser et al. (2015) covering Latin America and based on the interpretation of MODIS images for the time period 2001-2013 identified the dominant trajectory of forests being first converted to pastures and subsequently to cropland. They show, however, varying patterns on national and ecoregional scales. This regional variation is also emphasized by Ferreira et al. (2015), who describe a satellite-based transition matrix as input for a modeling study for different states in Brazil. They do not distinguish non-forest natural vegetation such as the Cerrado systems, which might be another im-portant source for agricultural land (Grecchi et al., 2014). A study conducted by Gibbs et al. (2010) investigating agricultural expansion in the tropics in the 1980s and 1990s based on data from the Food and Agriculture Organization of the United Nations (2000) (i.e., areas with less than 10 % forest cover are not considered) concludes that more than 80 % of new agricultural land originates from intact or degraded forests. Gibbs et al. (2010) further found large variability in agricultural sources across seven major tropical regions, e.g., substantially higher conversions from shrublands and woodlands to agricultural land in South America and eastern Africa. Grasslands have been detected as the main source of agricultural land in northern China, e.g., by Li (2008), Liu et al. (2009), andZuo et al. (2014), while in the Yangtze River basin woodlands contribute most (Wu et al., 2008) ( Table 2). All the studies mentioned indeed combine different approaches to derive changes, cover different time periods, and are not representative of current agricultural change hotspots (Lepers et al., 2005). However, this kind of aggregated analysis already indicates that the spatial pattern of agricultural change dynamics varies across world regions and a single global algorithm to replace natural vegetation by managed land in TBMs is likely to be overly simplistic.

Example: spatial heterogeneity of cropland transitions in the CLUMondo model
As it is not possible to compare the land-use allocation strategies of TBMs with historical change data on a global scale due to the lack of accurate global land-use and landcover products (though products with higher resolution (up to ∼ 30 m), more frequent temporal coverage, and increasing thematic detail are just emerging; Ban et al., 2015), we additionally tested to what extent cropland expansion simulated by the land-use change model CLUMondo Van Asselen and Verburg, 2013) represents one or more of the simplified algorithms currently considered in TBMs (Table 1). CLUMondo models the spatial distribution of land systems over time, instead of land use and land cover directly. Land systems are characterized by, in addition to other factors, a mosaic of land use and land cover within each grid cell. The land systems are allocated to the grid in each time step "based on local suitability, spatial restrictions, and the competition between land systems driven by demands for   Figure 4. Sources of agricultural land (cropland and pasture combined) for two time periods in Europe based on the CORINE land-cover data (a) and sources of cropland and pasture for two time periods in the USA based on the NLCD land-cover data (b) (Supplement Sect. S2.3, Table S3). Changes between different agricultural classes are not considered as expansion of agricultural land. Aggregation of CORINE and NLCD legends to forest, grassland, and shrubland is according to Tables S4-5. The category "other" includes urban land, wetlands, water, and bare land.  different goods and services" Van Asselen and Verburg, 2013). Thus, the determination of the source land use or land cover upon cropland expansion can be interpreted as a complex algorithm taking into account external demands, the land-use distribution of the previous time step, local suitability in a grid cell, and neighborhood effects (i.e., cropland expansion in a grid cell also depends on the availability of suitable land in the surrounding grid cells). This strategy differs from the one in TBMs in a way that not one simple rule is applied to each grid cell equally, but accounts for the spatial heterogeneity of drivers of landuse change.
In order to compare the sources of cropland expansion in CLUMondo to the globally applied rules in TBMs, we reclassified the outputs of the same CLUMondo simulation utilized in Sect. 3.2 (FAO3D; Eitelberg et al.; according to their dominant land-use or land-cover type to derive transitions (Table S6) and classified the changes within each ca. 0.5 • × 0.5 • grid cell as either grassland first, forest first, proportional, or a complex reduction pattern (Table 3; Fig. S2-3 and additional explanation in Supplement Sect. S2.4). Additionally, a grid cell was labeled undefined if grassland or forest was not available in the source map. Figure 5 shows the results of this analysis for decadal time steps between 2000 and 2040. Based on the CLUMondo Table 3. Definition of classified algorithms in the CLUMondo exercise (Sect. 4.3). CLUMondo data were preprocessed as described in the text and Supplement Sect. S2.4. Each ca. 0.5 • × 0.5 • grid cell was assigned a label according to the distribution of changes seen in the higher resolution (5 arcmin) CLUMondo data. Land types according to the reclassification of CLUMondo land systems are shown in Table S6; mosaics refer to a mixture of vegetation within a grid cell (e.g., forest and grassland).

Undefined
. . . forest or grassland were not available for conversion to cropland. * Unvegetated first . . . urban or bare were converted to cropland, although vegetation was available.
Forest first . . . forest was predominantly converted to cropland, although grassland and mosaics were available.
Grassland first . . . grassland was predominantly converted to cropland, although forest and mosaics were available.
Complex forest, grassland, and mosaics were simultaneously converted without a preference to one of the classes or proportional reduction. * If one of the two classes is not available for conversion, either of the preferential algorithms (unvegetated, forest, or grassland first) could be correct, but not executed because of the lack of the source that should be converted first. data, it is clear that a single simple algorithm does not account for the temporal and spatial heterogeneity of cropland expansion in a detailed land-use change model. The majority of grid cells with substantial cropland expansion (> 10 % of grid cell area) where we could detect an algorithm (i.e., the grid cell was not classified undefined) show a complex reduction pattern of the remaining land-use and land-cover categories, i.e., any algorithm applied to these grid cells in a TBM could be seen as equally good or bad. The remaining grid cells account for only 24-27 % globally. Moreover, the spatial distribution of grid cells that are classified to the same algorithm is very heterogeneous and changes over time. It has to be noted that this analysis builds on only one realization of one LUCM and results may differ if using another data source in terms of overall cropland expansion and the exact grid cell location of changes. However, the analysis does not aim at identifying the exact location of a particular algorithm but rather at emphasizing the heterogeneous pattern of cropland expansion.

Current approach to providing allocation information: the transition matrix
In CMIP5, most ESMs implemented a proportional reduction of natural vegetation rather arbitrarily due to reasons of simplicity or internal model constraints; others converted grassland preferentially and/or treated croplands differently from pastures upon transformation (de Noblet-Ducoudré et al., 2012). However, none of them depict the complex interplay of biophysical and socioeconomic parameters leading to a heterogeneous spatial pattern of land-use change within the coarse grid resolution used in ESMs. As we have shown in the previous sections, empirical evidence and land-use change models suggest that this complexity is poorly represented by simplistic, globally applied algorithms. The efforts of LUH thus included the provision of a transition matrix, i.e., the explicit identification of source and target categories between agricultural land and natural vegetation at the grid cell level. For each annual time step, the exact fraction of a grid cell that has changed from one land-use category to another is determined, thus providing the option to replace the simple allocation options with detailed information about land-use transitions within each grid cell .

Open issues of transition matrices
The provision of transition matrices, however, generally brings up a sequence of additional challenges, which we illustrate using the example of LUH in the following. First, the decision of which land-cover type should be replaced upon cropland or pasture expansion (or introduced in case of abandonment) is in fact only shifted from the TBM community to the IAM/LUCM community and the accuracy of the transitions are heavily dependent on the sophistication (i.e., knowledge about and depiction of land-use change drivers and processes on the grid scale) of the land-use allocation algorithm in the original model providing the land-use data. Many current models simulate land-use changes on a world regional level and downscale these aggregated results to the required grid cell level Schmitz et al., 2014). In the LUH approach these downscaled data are used to derive the minimum transitions between agricultural land use and natural vegetation. Additional assumptions are made to allocate changes in land-use states to explicit tran-  (Table 3) are not shown due to a very small contribution (< 0.1 %). Grid cells in this figure have been aggregated to ca. 1.0 • × 1.0 • following a majority resampling for reasons of readability. A high-resolution version of the maps, including the full detail of the classification results, can be found in the Supplement (Fig. S4).
sitions, not accounting for the spatial and temporal heterogeneity of the multiple drivers of land-use change. For example, urban expansion is applied proportionally to cropland, pasture, and (secondary) natural vegetation. Upon transitions between natural vegetation and agricultural land, choices in the model configuration have to be made, whether primary or secondary land is converted preferentially. These choices are similar to the grassland-or forest-first reduction algorithms applied in TBMs.
Moreover, due to the lack of empirical long-term, highly accurate land-use and land-cover change information and the inconsistencies between agricultural land-use data and landcover information from satellites, global IAMs and LUCMs are rarely evaluated against independent data (Verburg et al., 2015). It is thus not clear yet to what extent the spatial land-use patterns simulated by these models and provided to LUH represent a good estimate of real past and future landuse changes. In consequence, transitions derived from these modeled time series are uncertain.
Hence, it is evident that more and improved empirical information on land-use transitions is required to improve land-use change modeling and to estimate the natural systems at risk under agricultural expansion. However, the specific problem of allocating new agricultural land in DGVMs and LSMs also has strong model and data-structure components. In many DGVMs, the grass and forest PFTs on nonagricultural land in a grid cell are mostly not considered different systems, but are part of one complex vegetation structure thus not representing spatially horizontal heterogeneity. Therefore, when agriculture expands into such natural sys-tems, all natural PFTs need to be reduced proportionally. If handled otherwise (i.e., when removing a specific PFT preferentially), the vegetation dynamics would slowly converge again towards the initial PFT mix (if all boundary conditions like climate and soil properties remain unchanged).
For LSMs coupled to ESMs, the situation is slightly more complex. Most ESMs (if not incorporating dynamic vegetation through a DGVM) use a remote sensing product such as the ESA CCI-LC (ESA, 2014) and a translation to PFTs, e.g., Poulter et al. (2011), as a background vegetation map on which agricultural land is imposed. Due to inaccuracies in global remote sensing land-cover products and differences in historical reconstructions (as discussed in Sect. 2), fractions of agricultural land on a grid scale are subject to differences between the background map and the external land-use dataset. Consequently, the PFT composition outside the prescribed agricultural land can represent either the real heterogeneity in natural vegetation or represent a mix of natural and anthropogenic land cover due to differences in the datasets. However, these cases are difficult to distinguish and empirically justified transition matrices, together with more accurate present-day land-cover products, would provide a useful tool for reducing uncertainties due to allocation decisions in ESMs.

Tackling uncertainties in the harmonization
The LUH  has allowed the inclusion of anthropogenic impacts on the land surface for the first time in the CMIP5 climate change assessments. As we have shown in Sect. 2, three major sources of uncertainty, which include the uncertainty about land-use history, inconsistencies in present-day land-use estimates, and structural differences across IAMs and LUCMs, are poorly addressed through the almost exclusive implementation of the LUH dataset within the climate modeling community. A wider range of harmonized time series is therefore likely to substantially influence the outcomes of studies on land use-climate interactions. The actual impact of alternative harmonized time series on carbon cycle (and other ecosystem processes) and climate has never been tested, mainly due to the lack of alternative provision of such products. One would need a multi-model ensemble design to properly account for and disentangle the individual contributions of different historical reconstructions, the multitude of present-day land-use products, and varying future land-use change modeling approaches. Different future scenario models would need to be connected to different instances of historical reconstructions, both constrained by different plausible realizations (i.e., based on previously published, peer-reviewed approaches) of current land use and land cover. Such an approach would ensure a comprehensive coverage of the uncertainties accumulating across temporal and spatial scales prior to feeding land-use data into climate models and allow for testing of climate model sensitivity to different realizations of land-cover and land-use information.
The high computational demands of complex ESMs probably do not allow for multiple runs including all the uncertainties in land-use forcing. However, to derive robust results from climate model intercomparisons, a sufficient quantification of uncertainty in the land-use forcing dataset is urgently required. If this proves impractical through ESM simulations, we recommend utilizing less computationally expensive models such as DGVMs and offline LSMs to assess the full range of uncertainty and to determine a limited set of simulations, which appears to significantly affect biogeochemical cycles and climate. These can be subsequently used to test the uncertainty range in ESMs.
Simultaneously, we suggest that the land-use and remotesensing communities should engage to reduce uncertainties in land-use and land-cover products by 1. developing diagnostics for the evaluation of land-use reconstructions based on satellite data and additional proxy data such as pollen reconstructions (Gaillard et al., 2010) or archeological evidence of early land use (Kaplan et al., 2016); 2. developing systematic approaches to evaluating results of land-use change models against independent data sources, utilizing the full range of high-resolution satellite data (e.g., the Landsat archive and the European Sentinel satellites), reference data obtained from (sub-)national reporting schemes under international policy frameworks (e.g., Kohl et al., 2015), and innovative methods such as volunteered geographic information and crowdsourcing (Fritz et al., 2012). Although satellite data are also not directly measured empirical data, but go through a mathematical conversion process prior to a final land-cover product, they can improve representations of present-day land cover.
If not yet possible on the global scale due to the limitations discussed in Sect. 2, we recommend the implementation of regionalscale evaluation schemes using smaller-scale, highly accurate remote sensing products as a starting point for later integration into global applications.

Gross change representations
The full extent of gross changes is still not well understood (see Sect. 3). Thus, the land-use community should explore high-resolution remote-sensing imagery regarding their ability to derive gross change estimates and improve understanding of sub-grid dynamics, which are not yet captured by their models. Regions where driving factors of small-scale landuse change processes are more complex and not easy to determine due to frequent land-use changes should receive special attention. Based on such analyses, multi-century reconstructions and projections for climate and ecosystem assess-ments could be enhanced for at least the satellite era. As models extend further into the past, the detailed information could be gradually replaced by model assumptions, supported by additional reference data such as historical maps and statistics.

Transition matrix from empirical data
Explicit information of land-use transitions instead of annual land-use states is essential for questions regarding carbon and nutrient cycling. We argue that simple, globally applied assumptions about these transitions or the shift of the responsibility from TBMs to land-use models may not solve the problem (Sect. 4). Thus, the development of dedicated transition matrices increasingly based on empirical data (as soon as new products emerge) and sophisticated land-use change allocation models, which account for the spatiotemporal heterogeneity of land-use change drivers, is essential. Simultaneously, TBMs must ensure the use of the full detail of information provided by the implementation of explicit transition information in their land modules. Due to internal model structure, proportional reduction of PFTs needs to be applied in models with internally simulated dynamic vegetation. However, we recommend the utilization of explicit transition information to further evaluate discrepancies between the potential natural vegetation scheme and LULCC data provided by LUCMs and IAMs.

Outlook: towards model integration across disciplines
The ways forward listed in the previous section will only be the first stage of a process towards improved LULCC representation in climate change assessments. Rather than improving de-coupled data products and models on an individual basis and connecting them offline through the exchange of files, we argue that land use, land cover, and the climate system need to be studied in an integrated modeling framework. As we have shown in this paper, most of the challenges and related uncertainties originate in the disparate disciplinary treatment of the individual aspects. Although sophisticated models have been developed during the past decades within each community, the current offline coupling seems overly limited, accumulating an increasing level of uncertainty along the modeling chain. Integration of these different types of models, where anthropogenic activity on the land system is considered as an integral part of ESMs, instead of an external boundary condition, might help to reduce these uncertainties, although it will certainly further complicate the interpretation of model responses. For example, Di Vittorio et al. (2014) report preliminary results of the iESM (Collins et al., 2015), an advanced coupling of an IAM and an ESM implementing two-way feedbacks between the human and environmental systems, and show how this improved coupling can increase the accuracy of infor-mation exchange between the individual model components.
In the long term, additionally including behavioral land system models (e.g., agent-based approaches) in the coupling may provide further understanding of possible land-climatesociety feedbacks Verburg et al., 2015) since the current modeling chain rarely accounts for the complexity of human-environmental relationships and feedbacks .
Data availability. The illustrative analysis in Sects. 3 and 4 is based on CLUMondo simulations . CLU-Mondo source code and simulation results are available from http: //www.environmentalgeography.nl/site/data-models/.
The Supplement related to this article is available online at doi:10.5194/esd-8-369-2017-supplement.