Quantifying precipitation variability beyond the instrumental period is essential for putting current and future fluctuations into long-term perspective and providing a test bed for evaluating climate simulations. For south-eastern Asia such quantifications are scarce and millennium-long attempts are still missing. In this study we take a pseudo-proxy approach to evaluate the potential for generating summer precipitation reconstructions over south-eastern Asia during the past millennium. The ability of a series of novel Bayesian approaches to generate reconstructions at either annual or decadal resolutions and under diverse scenarios of pseudo-proxy records' noise is analysed and compared to the classic analogue method.
We find that for all the algorithms and resolutions a high density of pseudo-proxy information is a necessary but not sufficient condition for a successful reconstruction. Among the selected algorithms, the Bayesian techniques perform generally better than the analogue method, the difference in abilities being highest over the semi-arid areas and in the decadal-resolution framework. The superiority of the Bayesian schemes indicates that directly modelling the space and time precipitation field variability is more appropriate than just relying on a pool of observational-based analogues in which certain precipitation regimes might be absent. Using a pseudo-proxy network with locations and noise levels similar to the ones found in the real world, we conclude that performing a millennium-long precipitation reconstruction over south-eastern Asia is feasible as the Bayesian schemes provide skilful results over most of the target area.
Earth's climate varies in all spatial and temporal timescales, as it is forced by either natural or anthropic factors. To understand the dynamics of such variability, the analysis of the available instrumental information is an essential tool. However, the time coverage of the instrumental records is rather short and, therefore, information from climate archives (natural and documentary) going back centuries is important to put current and future changes into a long-term perspective and to serve as a validation terrain for model simulations with the ultimate goal of understanding the underlying physical mechanisms.
South-eastern Asian societies and economies are heavily dependent on the summer rainfall (monsoon-dominated) as a freshwater resource; thus, it is important to investigate how these precipitation patterns have varied in the past to provide a useful guide for the climate response to future changes. Previous hydro climate field reconstructions (CFRs) over Asia revealed a substantial mismatch between modelled and reconstructed precipitation patterns (Shi et al., 2017) and the spatial variability of large-scale droughts during the Little Ice Age (Cook et al., 2010; S. Feng et al., 2013). While these studies covered the last 500–700 years, a gridded hydroclimate product going beyond Medieval times at a spatio-temporal high resolution is still missing. Whether such a long and highly resolved reconstruction is possible given data and methodologies available nowadays is the subject of this paper.
Reconstructing the temporal evolution of climatic variables in the space domain (CFR) based on the information from a sparse network of proxies and partially overlapping instrumental data is a complex mathematical problem. First of all, the proxy data used for generating reconstructions display a set of characteristics that make their use challenging: their distribution in space and time is heterogeneous with fewer records further back in time; different proxy archives have different temporal resolutions and possibly include dating uncertainties; proxy data might reflect different climate variables (temperature, precipitation, sea-level changes, pH, seawater temperature, water mass circulation, etc.), recording climate conditions at different times of the year, and these data contain non-climatic information (usually referred to as non-climatic noise). Second, the overlap with instrumental observations is commonly short, limiting opportunities for statistical learning and further validation. Third, and in contrast to average climate reconstructions, CFRs require the spatial scale-up of the available information, therefore implying the need for strategic inferring of the missing values in the target climate field, even in locations where no data might be input. Finally, as the amount of paleoclimatic information becomes smaller back in time, it is virtually impossible to have an independent proxy data set to properly validate the output reconstruction. A common approach to overcome this shortcoming and have a proper validation stage is to use a pseudo-reality. The process of using a global climate model (GCM) simulation to assess the ability of a reconstruction technique is known as a pseudo-proxy experiment (PPE; Smerdon, 2012; Mann and Rutherford, 2002). In a PPE, simulated data are modified to mimic real-world proxies and instrumental observations (called pseudo-proxy and pseudo-instrumental data sets), and the reconstruction algorithms are applied. The reconstruction results are then compared with the available simulated target field, giving an estimation of the skill of the method in real-world applications.
There are several ways to perform a CFR (see Luterbacher and Zorita, 2018, for a review). The classical approach is through a multivariate regression perspective: a statistical relationship between proxy and instrumental data is inferred from the overlapping (calibration) period and then, assuming stationarity of this relationship, the missing instrumental values are predicted or reconstructed back through time. Some of the most common techniques for climate reconstructions included in this category are regularised expectation–maximisation (RegEM, Schneider, 2001), canonical correlation analysis (CCA; Smerdon et al., 2010), Markov random fields (Guillot et al., 2015) and the analogue method (Franke et al., 2011). The performance of these methods strongly depends on the length of the instrumental data. If the overlapping period between proxy and instrumental data is short, in comparison with the number of spatial locations considered, the estimation of the covariance matrix is uncertain and the matrix inversion process is numerically unstable, leading to poor performance when presented with new data out of the learning sample.
Another strategy to perform a CFR, more novel as it has only recently been applied in paleoclimatology, is the Bayesian approach (e.g. Tingley and Huybers, 2010, 2013; Werner et al., 2013, 2018; Luterbacher et al., 2016; Zhang et al., 2018). The Bayesian strategy is probabilistic, incorporates information about the climate–proxy connection as constraints on the reconstruction problem and has the benefit of providing more comprehensive uncertainty estimates for the derived reconstructions. Robust comparisons between established methods and the emerging efforts (Werner et al., 2013; Nilsen et al., 2018) underpin the benefits and justify further application of the computationally more expensive method. So far, most of the paleoclimatic applications of this methodology involve temperature reconstructions. Efforts to apply this probabilistic framework to the more complex and highly variable hydroclimate are only in the initial stages, but the advantages of the methodology over more classical approaches are auspicious.
Gómez-Navarro et al. (2015) used a PPE approach to assess the skill of several statistical techniques (classical regression methods and Bayesian) in reconstructing the precipitation of the past 2 millennia over continental Europe. The authors find that none of the schemes shows better performance than the others and that precipitation reconstructions over Europe are only possible given a spatially dense and uniformly distributed network of proxies, as the accuracy strongly deteriorates with distance to the proxy sites.
In this study we propose to evaluate, via PPE, the potential to generate a last-millennium summer precipitation reconstruction for south-eastern Asia. We use three CFR techniques: Bayesian hierarchical modelling (BHM), BHM coupled with clustering processes (with two different numbers of clusters), and the analogue method. For each of the schemes we perform two reconstructions: one at annual and one at decadal resolution. In addition, the influence of the noise level in pseudo-proxies on the final reconstruction is evaluated.
This is the first time that a BHM approach is applied to the hydroclimate of Asia, and its coupling with clustering techniques is a methodological advance, configuring an innovation in the field. The systematic evaluation of the skill of these probabilistic methods, and the comparison with the more classical and well-established analogue technique, is a necessary step to learning about the precipitation variability and the opportunities for or obstacles to generating long-range informed guesses about it. The PPE exercise is a fundamental validation step, essential for selecting the most appropriate method to improve real-world reconstructions and, finally, deriving a new and not previously attempted gridded product of south-eastern Asia summer precipitation during the last 1000 years. In this work only summer precipitation is targeted as the pseudo-proxy network selected is based on real-world indicators of summer hydroclimatic variations (see Sect. 2.2).
The paper is organised as follows. In Sect. 2 we present the data and methodology and describe in detail the three reconstruction techniques, as well as the skill scores used for quality evaluation. Section 3 is devoted to the results and discussions: we evaluate the skill of each of the reconstruction methods, at both annual and decadal resolution, and investigate the role of the pseudo-proxy noise. Finally, in Sect. 4 we present conclusions and a short outlook.
As a virtual reality setup for our study we use one full-forcing simulation
(run 001) of the Community Earth System Model (CESM) from the Last
Millennium Ensemble (LME) project (Otto-Bliesner et al., 2016). The
simulation is performed with a horizontal resolution of
Figure 1 depicts the JJA mean precipitation in the run used in this paper, considering only the last 100 years of simulation (period 1906–2005). Historical simulations with the CESM show a reasonable performance in reproducing summer precipitation over continental Asia: the simulated JJA precipitation is generally in agreement with observations, although a false rainfall centre over the eastern Qinghai–Tibetan Plateau is generated in these simulations (Wang et al., 2015).
Simulated mean JJA precipitation (millimetres per month) during the instrumental period (years 1906–2005) over continental Asia. Black dots: pseudo-proxy network.
For this study we select the locations of 47 real-world precipitation-/drought-sensitive proxies in the target domain that span the last millennium. The locations of tree ring, speleothem, lake sediment and ice core sites as well as of some documentary data are mainly derived from the networks used in Chen et al. (2015) and Ljungqvist et al. (2016) (Table 1). The criteria for the selection of records were millennium-long (with start date before 1000 CE), at least two values per century, terrestrial, published in the peer-reviewed literature, and described as an indicator of local variations in hydroclimate.
List of the real-world proxy records used to select the locations of the pseudo-proxy network.
For the design of the PPEs we build two data networks: a pseudo-proxy one and a
pseudo-instrumental one. The pseudo-proxy network is based on the locations of
the real-world hydroclimate proxies listed in Table 1. As some of these 47
records are in close proximity, this translates into having 38 different
model grid points (about 10 % of the total grid points in the study
region). The selected locations are not evenly distributed across
south-eastern Asia: the highest concentrations are found over eastern China and
over the dry lands in the north-west of the study region (Fig. 1). There are
neither pseudo-proxy sites southward of 20
For the pseudo-instrumental network we consider all the locations for which a reconstruction is targeted: 366 model grid points in south-eastern Asia. For each of these locations, we take the modelled precipitation time series for the last 100 years of simulation (at either annual or decadal resolution) and add a small Gaussian noise to represent the instrumental errors present in real precipitation measurements. The added noise is such that, at each location, the correlation between the original and contaminated time series is 0.95.
Example of pseudo-proxy, pseudo-instrumental and true
precipitation time series at location [20
As an example, Fig. 2 shows the simulated precipitation time series at
location [20
In the following subsections we describe in detail each of the three reconstruction techniques used in this paper.
In the BHM technique a hierarchy of parametric stochastic models is used to describe the relationship between climate, instrumental and proxy data. The model parameters are estimated using the available data, through Bayes' rule. The hierarchy consists of three basic components. First, in the process level, a stochastic model describing the time evolution of the climate variable is selected. Second, in the data level, stochastic relationships between the instrumental and proxy data and the climate variable are developed. Finally, a level of prior information about the parameters involved in the other two components of the hierarchy is coupled. Here we use the BHM algorithm named the Bayesian Algorithm for Reconstructing Climate Anomalies in Space and Time (BARCAST), developed by Tingley and Huybers (2010). In the following, we specify the assumptions and equations for each of the levels in the model hierarchy.
The process level describes the evolution of the true climatic field as a multivariate autoregressive process of order 1, AR(1),with spatially correlated innovations.
The evolution of the true precipitation, sampled at a finite number of
spatial locations, is assumed to follow a first-order autoregressive
process:
The temporal model within BARCAST allows the estimations of the field at a
certain temporal step to be influenced by the information in the previous
time step. The assumed covariance matrix structure is supposed constant in
time and follows an exponentially decaying pattern with distance. Note that,
by assuming this structure, if two distant locations have well-correlated
precipitation time series, this will not be well represented by the BARCAST
model assumed. The method parameterises the spatial covariance matrix with
two unknown parameters: the covariance at null distance (
The model assumes that the climatic variable, precipitation, follows a Gaussian distribution. Although this might not be the case, especially for arid regions, the simulated JJA precipitation in the area of study can be taken to reasonably follow this assumption: for the pseudo-proxy selected locations, 63 % of the time series (considering the instrumental period) pass the Kologorov–Smirnov test for normality at a 95 % confidence level (Fig. A1). Despite the Gaussian conditions not being met in all the grid points, the model is still valid, although it might not be the most optimal fit at these locations.
Figure 3 shows the correlation decay with distance for the simulated JJA
precipitation for different latitudinal bands. For annual data (Fig. 3a),
the correlation between precipitation time series in consecutive grid points
is usually high, around 0.8. With few exceptions, the simulated
precipitation follows an exponentially decaying pattern with distance, with
points located further away than 600 km showing no significant correlation.
Therefore, we take the exponentially decaying spatial structure of the
covariance matrix in BARCAST to be a reasonable assumption for the model.
For decadal data (Fig. 3b), the correlation behaviours are not uniform
with respect to the latitudinal bands. For some of the latitudes the plot
follows an exponentially decaying shape, and for others it additionally
shows a teleconnection pattern (notably the northern-most 44–48
Correlation of simulated JJA precipitation time series across
different latitudinal bands versus distance. Only the instrumental period
(years 1906–2005) and the grid points in continental Asia are considered for
the calculation.
The data level specifies the relationship between the measurements (both proxy and instrumental) and the true field values.
The instrumental observations at each time are assumed to be noisy
variations of the true precipitation field:
The proxy observations are assumed to follow an unknown statistically linear
relationship with the true precipitation at each location:
To close the scheme, prior distributions must be specified for the eight
scalar parameters
Using Bayes' rule the posterior distribution of each of the unknown
variables can be calculated. Samples are drawn from these posterior
distributions using a Gibbs sampler, with a Metropolis step (Gelman et al., 2003) to update
Here we propose to couple the BHM with a clustering algorithm. The aim of the clustering step is to segregate south-eastern Asia into several clusters, according to similarities in the precipitation regimes during the pseudo-instrumental period. After the clustering, the BHM code is run within each cluster in an independent manner. Finally, all the results are merged together to produce the entire spatial reconstruction over the post-850 period. The idea behind the clustering step is to reduce the complexity of the problem to be presented to the BHM algorithm, as after clustering the code does not have to deal with extreme differences in precipitation regimes (as dipole patterns at mountain ranges) and a large number of grid cells.
We use a hierarchical agglomerative clustering technique. Each observation starts in its own cluster and pairs of clusters are agglomerated as one moves up in the hierarchy (Izenman, 2008). We select a complete-linking strategy: the distance between sets of observations is defined as the maximum of the pairwise distances between the observations in each of the sets. First, the method groups together the two closest observations, according to the selected distance, creating a cluster of two observations. Then, the sets whose distance is minimum are agglomerated together, iteratively repeating the process.
Here, the elements to cluster together are the different grid points in south-eastern Asia. The input variables for the method are the pseudo-instrumental precipitation time series at each of these locations. The distance between two points is defined as 1 minus the correlation between the pseudo-instrumental precipitation time series at these locations (points highly correlated display a small distance). In this way, the method groups together points whose pseudo-instrumental precipitation time series are highly correlated. We should note that the clustering algorithm does not require any expert knowledge as it is a fully unsupervised machine learning technique. This characteristic makes it easy to apply as a pre-BHM stage in any other context or area of study.
For both the annual and decadal reconstructions we select two cases:
clustering into 5 and into 10 groups (note that the clusters might be
different when using the annual/decadal information; see Fig. A2). We term
the reconstructions in this category BHM
The analogue method is a learning technique first introduced by Lorenz (1969) for weather forecasting. The technique uses predictors to determine the value of the target variable, based on the statistical relationship between them in a learning set: the so-called pool of possible analogues. The method can also be applied to produce a CFR. In our study and for each time step (year or decade), the predictor variables are the proxy records (38 predictors) and the target variable is the complete precipitation field at the given time step. For the annually resolved reconstruction the learning set consists of the precipitation fields at each of the years in the instrumental period, i.e. all the time steps in which we simultaneously have the information about proxy and target. For the decadally resolved reconstruction, the learning set consists of the mean precipitation field in each possible 10-year period during the instrumental era.
The reconstruction of the precipitation field at time step
Note that in this paper we use the analogue method in its classical version (obtaining the pool of analogues from the observational data set) and not in combination with the use of a GCM to draw the analogue cases from.
To evaluate the performance of the CFR methodologies, we compare the reconstruction with the true precipitation field. We select three different skill metrics. The first skill metric, the correlation coefficient, evaluates the ability of the reconstruction to reproduce the temporal evolution of the target. At each grid point, we calculate the Pearson correlation between the reconstruction and the true precipitation time series, considering the whole reconstruction period. As for the Bayesian algorithms, we have an ensemble of reconstructions: we first calculate the correlation of each of these ensembles with the true precipitation and, finally, we show the mean of these correlations.
The second skill metric quantifies the absolute biases of the reconstruction
at each location. Instead of directly using the root mean squared error
(RMSE), we compare the RMSE of the different reconstructions with the RMSE
obtained with the simplest possible reconstruction: using the climatological
mean during the instrumental period. In reconstruction studies, this is
usually referred to as the reduction of error (RE, Cook et al., 1994) and is
defined, at each location
The last skill metric is especially designed to evaluate probabilistic
ensemble forecasts of continuous predictands and is, therefore, particularly
suitable for evaluating the Bayesian schemes. We use the continuous ranked
probability score (Hersbach, 2000; Wilks, 2011; Werner et al., 2018). The
CRPS measures the difference between the accumulated probability density
function and the step function that jumps from 0 to 1 at the observed value:
In the following sub-sections we evaluate the ability of the different reconstruction techniques. In Sect. 3.1 we select a pseudo-proxy scenario with medium noise level (equivalent to a correlation with the target precipitation of 0.5) and evaluate the reconstruction schemes. In Sect. 3.2, we assess the impact of the noise in the pseudo-proxy time series on the quality of the reconstruction.
As measures of performance we present the three selected skill metrics (see Sect. 2.5 for details), and in each case, we show the results at annual and decadal resolutions.
Figure 4 displays the correlation coefficient for the different
reconstruction techniques. According to this skill measure, regardless of
the method and resolution, proxy-rich East China (EChina, 20–40
Correlation between target precipitation and different
reconstructions, at each grid point.
For the annual-resolution reconstructions, the best performance is obtained
by the BHM technique, showing a spatial mean correlation with the target of
0.4 (Fig. 4a). Coupling the BHM with clustering partially deteriorates the
results, with the correlation coefficient severely dropping over the
proxy-rich EChina region (Fig. 4b and c). Meanwhile, the performance of the
analogue method is inferior: the correlation coefficient spatial mean is
0.25 and there is no skill in reconstructing precipitation north of 42
For the decadally resolved reconstructions the difference between the Bayesian methods and the analogue is even larger. In terms of the correlation coefficient measure the BHM (analogue method) is the best (worst) performing, with a spatial average of 0.37 (0.1). Among the Bayesian schemes, the cluster coupling maintains the skill levels in all regions except India, where lower correlation values are obtained. The analogue method shows a much constrained geographical skill, with correlation values above 0.2 only over EChina and central India.
RE index for different reconstructions, at each grid point.
In general, for each of the methods, the correlation coefficient is higher
for the annually resolved than for the decadally resolved reconstruction.
One exception to that is the BHM
Figure 5 shows the results for the RE index. In most of the grid points the RE index is positive, indicating a reduction of the error in comparison to forecasting the instrumental-period climatology as a reconstruction. For all the Bayesian methods and both time resolutions the highest skill is found in regions with high density of pseudo-proxy information. Again, the analogue method shows a clearly inferior performance to NWAChina, in spite of the considerable number of pseudo-proxy locations present there.
For the annual reconstruction, improvements from climatology are found for the Bayesian approaches in EChina, NWAChina, Mongolia and, to a lesser extent, in central India (Fig. 5a, b and c). For the analogue method, the improvement with respect to climatology is confined only to EChina and central India, and the improvement is weaker than with the Bayesian techniques (Fig. 5d).
For the decadal data, similar results are obtained. However, the RE index is
notably negative in some grid points for the BHM
Figure 6 displays the results for the CRPS metric, for the probabilistic methods (Bayesian schemes). For this metric, the annually resolved (decadally resolved) reconstructions have a CRPS of 190 mm per month (22 mm per month), compared to the target precipitation spatially averaged standard deviation of 34 mm per month (11 mm per month) for annual (decadal) data. This indicates that the methods have more problems in reproducing the expected probability distribution functions in the annual case.
CRPS for different reconstructions, at each grid point.
For the annual-resolution reconstructions there is almost no noticeable
difference in the performance of the three Bayesian schemes. For this
metric, the region of best performance is NWAChina. In this case, the
performance over the proxy-rich EChina is intermediate (unlike with the
correlation coefficient and RE index metrics). For the decadal-resolution
reconstructions, the performance among the methods is quite different. While
the spatial mean is in all three cases similar (around 22 mm per month), the
spread among grid points is much higher for the BHM
Three main conclusions can be drawn from the experiments above: first, proxy-depleted areas cannot be successfully reconstructed. Second, the Bayesian schemes are superior to the analogue method in all metrics (this difference is particularly acute over NWAChina, where the analogue method fails despite the relatively good coverage by proxy data). Third, among the Bayesian algorithms the results are similar, although a partial deterioration of the skill is detected in some regions when clustering is coupled.
The underperformance of the analogue method in comparison with the BHM variants might seem in contradiction with the results of Gómez-Navarro et al. (2015), who do not find any significant skill differences between these schemes. However, we should note an important difference between the two studies: in Gómez-Navarro et al. (2015) the authors use as their pool of analogues an independent highly resolved simulation performed with a regional model, while in this paper we use the classical analogue approach based on the instrumental-period pool. This difference makes it impossible to draw a fair comparison between the two studies, indicating that the pool of analogues is essential for determining the potential success of the analogue method as a reconstruction technique.
We hypothesise a couple of reasons for the failure of the analogue method over NWAChina: first, the semi-arid precipitation regime dominant in the area and, second, an insufficient number of analogues in the pool. As the method is unsuccessful at both annual and decadal resolutions, we think that the number of elements in the pool of analogues is not an important variable and that the main cause of the failure resides in the fact that non-normal-behaving time series could potentially be more difficult to mimic by analogues than Gaussian-behaving ones. However, providing a proof of such a hypothesis is out of the scope of this paper and will require the design of new theoretical experiments with input data arising from different probability distributions.
Disentangling the reasons leading to a partial deterioration of skill when coupling the BHM to clustering algorithms will require additional experiments. However, we hypothesise that the main reason for such behaviour is related to the loss of information from geographical neighbours. While during clustering geographical neighbours can be separated, the information from such sites is taken into account in the covariance matrix structure of BHM and, therefore, losing information from close locations might affect the final performance.
Next, we evaluate the impact of noise in the pseudo-proxy time series on the
skill of the reconstruction techniques. We focus on two schemes: one
Bayesian (BHM
Spatial mean correlation skill of reconstruction techniques for different noise levels (expressed here in terms of the correlation between the pseudo-proxy and truth).
Figure 7 shows the dependency of the correlation coefficient, averaged in
space, with noise levels in the pseudo-proxy records. At annual
resolution, the skill of the methods increases in an almost linear way with
the quality of the pseudo-proxy records, except for a drop in the Bayesian
skill in the no-noise scenario. The BHM
BHM
For the Bayesian algorithm (Fig. 8), the perfect-proxy case shows high
performance over NWAChina, EChina and north-east of the study area, at
annual and decadal resolutions. For the annual reconstruction, the skill of
the scheme is low southward of 25
Analogue method performance in terms of correlation with target
for different levels of noise at annual
Figure 9 presents the analogue method performance. For annual resolution, in
the case of perfect pseudo-proxies, the method is successful in the central
part of the study area (between 15 and 45
To summarise, as expected, the noise in the pseudo-proxy time series is important, as the quality of the reconstruction rapidly decreases with the noise level.
This study evaluates the ability of several statistical techniques to reconstruct the precipitation field over south-eastern Asia in a PPE setting. The reconstructions are performed using 1156 years of model simulation (corresponding to the period 850–2005), at annual and decadal resolution. The techniques used are BHM, BHM coupled with clustering (dividing south-eastern Asia into 5 or 10 clusters) and the analogue method. While the analogue method is a classical approach and has been widely used, the Bayesian variants are novel for the hydro-climatological reconstructions' field, this being the first time the technique is applied for Asian precipitation reconstruction. Moreover, the coupling of the Bayesian modelling with clustering algorithms is also an innovation that could potentially lead to a more widespread application of these computationally intensive processes.
We find that for all the algorithms and resolutions a high density of
pseudo-proxy information is a necessary but not sufficient condition for a
successful reconstruction. On the one hand, the lack of proxy data over regions
such as the north-east of the study area, south of Tibet and south of 20
Among the three Bayesian schemes the differences in skill are not extremely
notorious, although a partial deterioration of the skill is detected in some
regions when clustering is coupled. Noting that the Bayesian technique
without any form of pre-clustering of the area of interest (BHM) is
extremely computationally expensive, coupling it with a clustering scheme
(BHM
We also find that the quality of the final reconstructions is highly sensitive to the noise levels included in the input pseudo-proxy data, those variables being negatively correlated. Only under a perfect-proxy (no-noise) scenario and at annual resolution is the analogue method capable of overperforming the Bayesian schemes over most areas. Even in this ideal no-noise case NWAChina remains elusive for the analogue methodology.
As a summary, we find that for millennium-length precipitation reconstructions over south-eastern Asia a dense network of proxy information is mandatory for success, highlighting the complex nature of the precipitation field in the area of study. Among the selected algorithms, the Bayesian techniques perform generally better than the analogue method, the difference in abilities being highest over the semi-arid north-west and in the decadal-resolution framework. The superiority of the Bayesian approach indicates that directly modelling the space and time precipitation field variability is more appropriate than just relying on similarities within a restricted pool of observational analogues, in which certain regimes might not be present.
A natural next step is to implement real-world reconstructions of precipitation in the region of continental south-eastern Asia. These PPEs are auspicious for such a future endeavour, as some moderate skill can be expected in most of the region. Nevertheless, it is important to acknowledge that these experiments are highly idealised and that real-world data might incorporate additional constraints and challenges. Additionally, more PPEs could also be designed by omitting some of the simplifications assumed here. For example, while here we only took proxy time series that cover the whole period of interest, with the same temporal resolution, same signal to noise relation and same relationship with the underlying hydroclimatic variable of interest, some of these constraints could be modified to better resemble reality.
Data sets, codes and analysis
scripts used in this study can be obtained from:
Kolmogorov–Smirnov normality test on the simulated JJA
precipitation during the instrumental period (years 1906–2005, at annual
resolution).
Divisions into clusters (in each plot different colors indicate
different clusters), using the simulated JJA precipitation in the
instrumental period (years 1996–2005) as input.
ST made all the calculations, produced all the figures and wrote the main text. LS, JW and JL contributed with discussions and comments on the manuscript.
The authors declare that they have no conflict of interest.
This article is part of the special issue “Hydro-climate dynamics, analytics and predictability”. It is not associated with a conference.
Stefanie Talento, Lea Schneider and Jürg Luterbacher are supported by the Belmont Forum and JPI-Climate Collaborative Research Action “INTEGRATE: An integrated data-model study of interactions between tropical monsoons and extratropical climate variability and extremes”. Jürg Luterbacher acknowledges support by the UK–China Research and Innovation Partnership Fund through the Met Office Climate Science for Service Partnership China (CSSP) as part of the Newton Fund.
The authors thank the reviewers for constructive criticism and suggestions that improved the quality of the paper. The authors also thank the proxy and model data providers.
This research has been supported by the Belmont Forum, JPI-Climate (INTEGRATE grant). This open-access publication was funded by Justus Liebig University.
This paper was edited by Naresh Devineni and reviewed by Tine Nilsen and one anonymous referee.