This study introduces the Systematic Correlation Matrix Evaluation (SCoMaE)
method, a bottom–up approach which combines expert judgment and statistical
information to systematically select transparent, nonredundant indicators
for a comprehensive assessment of the state of the Earth system. The methods
consists of two basic steps: (1) the calculation of a correlation matrix
among variables relevant for a given research question and (2) the systematic
evaluation of the matrix, to identify clusters of variables with similar
behavior and respective mutually independent indicators. Optional further
analysis steps include (3) the interpretation of the identified clusters,
enabling a learning effect from the selection of indicators, (4) testing the
robustness of identified clusters with respect to changes in forcing or
boundary conditions, (5) enabling a comparative assessment of varying
scenarios by constructing and evaluating a common correlation matrix, and
(6) the inclusion of expert judgment, for example, to prescribe indicators,
to allow for considerations other than statistical consistency. The example
application of the SCoMaE method to Earth system model output forced by
different CO
An indicator is a quantitative value, measured or calculated, that describes
relevant aspects of the state of a defined system. A useful indicator should
fulfill certain characteristics that depend on the purpose of the indicator
For the assessment of ongoing climate change, models representing the
physical and biogeochemical processes of the Earth system and known as Earth
system models (ESMs) are one of the essential tools because the inertia of
the climate system to (carbon) perturbations requires projections of future
climate states. Early climate models applied simple zero- to two-dimensional
calculations to assess the effect of atmospheric CO
But as Earth system models and observational data sets continuously increase
in complexity, there are more and more variables available that could
potentially serve as indicators of the state of the climate system. Which
ones should we select for a fully comprehensive assessment of changes in the
climate system, ideally, without providing redundant information? A common
bottom–up approach for measuring complex systems is to start from a broad set
of (Earth system) variables and consecutively select more appropriate ones
depending on the research question (e.g.,
The selection of a limited number of indicators that support scientific or
political decision making is a major challenge for experts, who in this case
have to decide on the relative importance of a variable in relation to others
In this study we want to introduce a bottom–up indicator selection method that uses statistical information about variables in addition to expert judgement, thereby attempting to reduce bias in the selection process. Systematic Correlation Matrix Evaluation (SCoMaE) uses information on correlations between variables to identify “clusters” of variables that show similar behavior. We then systematically select scientifically consistent indicators to represent these clusters. The identified indicators are independent and do not provide redundant information. A set of independent indicators hence allows for a more comprehensive science-led assessment of the system under consideration than a set of correlated indicators. Furthermore, SCoMaE allows for a learning process by providing new information about correlations between the given variables and hence increases the system understanding.
To illustrate the SCoMaE method, we exemplarily select indicators to answer
the following research question: “How are changes in the climate
system influenced by the sensitivity of the marine and terrestrial biological
system to temperature and CO
Before the SCoMaE method can be applied, it is crucial to identify and formulate the research question. For our example we chose to address the following research question: “How are changes in the climate system influenced by
the sensitivity of the marine and terrestrial biological system to
temperature and CO Our application is comparable with a multi-model ensemble where each of the
perturbed parameters is a slightly different version of the default model. We
could hence do the very same analysis as described in Sect. To select indicators to answer the research question of which changes in the
simulated Earth system are robust throughout state-of-the-art Earth system
models, we could again use the CMIP5 data sets. Here, one would probably
want to calculate correlations between time series of different variables.
This would give information about similar frequencies of those variables,
which in turn suggests similar underlying processes. One could compare
correlation matrices of one model during different forcing scenarios, as
described in Sect. In that sense SCoMaE could also be applied to calculate correlations of
observational time series. Since there is a higher level of noise within this
data, it is possible to concentrate the research question on predefined timescales and filter the time series of the variables before applying the
SCoMaE method. The indicators would accordingly be selected to answer the
following underlying research question: “Which are the independent processes that I need
to study for a comprehensive assessment of changes in the climate system of a
given frequency band?”
Coming back to our example application of the SCoMaE method, we now want to
briefly explain the model setup and simulations.
This paper illustrates the SCoMaE method for the example of model simulations
performed with version 2.9 of the University of Victoria Earth System Climate
Model (UVic ESCM), an Earth system model of intermediate complexity
A list of the globally aggregated output variables is given in Table
For the default model simulation, the UVic ESCM was spun-up with
preindustrial (year 1765) seasonal forcing for over 10 000 years. All
simulations were integrated from 850 until 2005 using historical fossil-fuel
emissions and land-use changes, as well as radiative forcing from solar
variability and volcanic activity following
For the sensitivity analysis performed with the UVic ESCM, different model input parameters and parameterizations were perturbed, and for some of them it was necessary to do a new model spin-up to reach steady-state conditions again; apart from this the forcing was the same for all simulations.
Illustration of the correlation matrix construction for the example case study, and the model output variables surface air temperature (SAT) and Northern Hemisphere (NH) sea ice area. In the first step, temporal differences of the simulations are calculated between 2005–2015 and 2090–2100. Second, changes in the variables induced by the parameter perturbations are correlated. Last, this correlation information is used as one of many entries in the correlation matrix.
List of perturbed model input parameters.
In the following sections, the single-parameter perturbation experiments,
which are used in the example and shown in Fig.
Small-scale physical mixing (vertical diffusivity or diapycnal mixing) in the
ocean is parameterized in all global models because of their resolution.
Thus, this important process, which plays a key role in determining ocean
circulation and biogeochemical cycles as well as ocean to atmosphere heat and
carbon fluxes, is set by necessity as a single global value or several regional values that fall within the range of observational estimates of vertical
diffusivity. To test how this affects all model results, we varied this
parameterization by increasing and decreasing it by 50 % (Kv low and Kv high),
which is within the range of observational estimates
Although biological processes are known to be sensitive to temperature, there
is a significant amount of uncertainty in how biology will respond to warming
caused by climate change No marine biological sensitivity to temperature: the results of this analysis
can be used to estimate a lower boundary for how marine plankton and how
their effect on biogeochemical cycles will respond directly to global warming
(no marine No terrestrial vegetation sensitivity to temperature: the results of this
analysis can be used to estimate a lower boundary for how terrestrial
vegetation and its effect on the carbon cycle will respond directly to global
warming (no terr.
To further investigate the sensitivity of terrestrial biology to temperature, we varied the vegetation and soil
Increasing atmospheric CO
Transpiration by plants is highly sensitive to increases in atmospheric
CO
Mesocosm studies that artificially increase the amount of CO
Throughout this study, a variable is defined as a model output or observational time series, whereas we refer to it as an indicator if a variable was selected to represent a certain aspect of the considered system. To obtain a comprehensive, nonredundant set of indicators to describe a given system, the first step is to construct a correlation matrix, i.e., a matrix including the correlation information of all the relevant Earth system variables to each other. The construction of the correlation matrix strongly depends on the research question and needs to be adjusted accordingly. The selection of which variables are the relevant variables for the given research question and hence should be included in the matrix, as well as the choice of how the correlations should be calculated is very important for the outcome of the study. In the same way, it is important to consider a reasonable signal-to-noise ratio within the data set chosen. Correlations could for example be calculated between time series of variables or their derivatives, absolute temporal changes, or spatial patterns. Alternatively, output from ensemble simulations could be used to calculate correlations between changes in variables due to the different ensemble members. The matrix is then evaluated based on the significance information of these correlations (see Step 2). Note that for this preselection of the possibly relevant variables to answer the given question, as well as for the construction of the correlation information in the matrix, a certain level of expert judgement is needed.
To illustrate the construction of the matrix based on our example
simulations, we show how the correlation between changes in global mean
“surface air temperature” (A_sat) and
“Northern Hemisphere sea ice area” (O_iceareaN) in the Representative
Concentration Pathway (RCP) 8.5 emission scenario
Assuming that the signal of interest is of a similar kind as the state
differences between the start and the end of a climate change simulation, we
start by calculating the temporal differences between 2005–2015 and 2090–2100
from a number of parameter perturbation simulations that serve as
our ensemble in this example (see Sect.
In our example, there is a negative correlation of variable changes evident between “surface air temperature” (A_sat) and “Northern Hemisphere sea ice area” (O_iceareaN). This illustrates that these model output variables show consistent opposite reactions towards the parameter perturbations, i.e., if the perturbation causes surface air temperatures to increase, it also causes northern hemispheric sea ice to decrease. This information is then written into the correlation matrix. By studying the constructed correlation matrix and studying single correlations of changes between model output variables, we can learn about basic processes within the simulated climate system and test whether these agree with our expectations. To simplify the visual analysis of our example we sorted the variables in the matrices according to their strength in correlation of variable changes relative to changes in the commonly used climate change indicator, i.e., “surface air temperature” (A_sat) in the historical scenario.
Illustration of the indicator selection process using the example of
the correlation matrix for the historical scenario (see
Fig.
To obtain a set of indicators for the assessment of changes in the system under consideration, we systematically evaluate the previously constructed correlation
matrix (see Fig.
In our example, we applied the SCoMaE method to the correlation matrix
concerning 46 commonly used variables for the assessment of climatic changes
in the historical forcing scenario, simulated by the UVic ESCM (see
Sect.
“Surface albedo on land” (A_albsurL) is identified as the second indicator.
After excluding all variables correlated to changes in “precipitation over ocean” (F_precipO), its changes due to the parameter perturbations are significantly correlated to changes in “net surface downward shortwave
radiation” (F_dnswr), “ocean oxygen” (O_o2), and “sea surface salinity” (O_salsur).
The third indicator is “ocean surface alkalinity” (O_alksur),
which shows the same response to the parameter perturbations as “ocean
surface phosphate concentrations” (O_po4sur). When excluding all variables
that are clustered under one of the three abovementioned indicators,
three variables remain unclustered: “mean ocean temperature” (O_temp),
“maximum meridional overturning” (O_motmax), and “ocean phytoplankton” (O_phyt).
These variables are hence single indicators, which are needed for
a comprehensive assessment of the system under consideration (Fig.
See Sect. 1 and Figs. S1 and S2 in the Supplement for the results of these analyses for the intermediate–high (RCP4.5) and the business-as-usual (RCP8.5) scenarios, respectively.
Indicators identified from the analysis of the RCP4.5 (blue) and RCP8.5 (red) correlation matrices with the precondition to use the historical indicators first. The indicators are as follows: “precipitation over ocean” (F_precipO), “land surface albedo” (A_albsurL), “ocean surface alkalinity” (O_alksur), “mean ocean temperature” (O_temp), “ocean phytoplankton” (O_phyt), “ocean overturning” (O_motmax), “net radiation at the top of the atmosphere” (F_netrad), “ocean surface dissolved inorganic carbon” (O_dicsur), and “downward shortwave radiation” (F_dnswr).
In order to learn how well the previously identified indicators for one scenario explain a different scenario with changed forcing, we prescribe the use of the previously identified indicator set. The SCoMaE accordingly first uses these indicators and then analyses whether and which additional indicators are needed for a fully comprehensive assessment of the new scenario.
For the example, we prescribed the indicators identified for the historical
scenario to assess the intermediate–high (RCP4.5) and the business-as-usual (RCP8.5) emission scenarios
(Fig.
Note that Earth system variables clustered under the prescribed indicators
differ among the different scenarios (compare Figs.
The differences between the correlation matrices for the RCP8.5 scenario
compared to the historical scenario are even larger (compare
Figs.
These differences in the correlation matrices for the different forcing scenarios indicate changes in prevailing correlations between Earth system variables with the imposed climate forcing. This illustrates that a reevaluation of the indicators chosen may be needed for a comprehensive assessment of different climate strategies yielding different climate states.
To advance this analysis such that changes in correlation matrices from different forcing scenarios can be taken into account, it is possible to create a correlation matrix representing only those correlations that are significant in all forcing scenarios; this is defined as a common correlation matrix. Applying the SCoMaE method to such a common correlation matrix identifies an indicator set that can be used to assess and also compare multiple scenarios and which hence differs from the previously identified sets for the individual correlation matrices.
To obtain a common indicator set for the three example forcing scenarios
(historical, RCP4.5, and RCP8.5), we construct a correlation matrix in which
only correlations of variable changes that are significant in all these
scenarios are considered (Fig.
A first visual evaluation of the common correlation matrix shows more reddish
than bluish shading, which indicates that the correlation patterns for the
historical and RCP4.5 scenarios are more similar than for the historical and
RCP8.5 scenarios (Fig.
The first indicator obtained from the common SCoMaE analysis is “atmospheric
CO
The second indicator is “precipitation over land” (F_precipL), which is clustered with “terrestrial evapotranspiration” (F_evapL) and “net upward longwave radiation” (F_uplwr) (Fig. S6). This cluster accordingly represents changes in terrestrial moisture fluxes and the resulting surface upward fluxes of longwave radiation. The latter relates to the surface air temperature, which on land is strongly influenced by the amount of evapotranspiration, and the resulting evaporative cooling. Note that the fact that terrestrial moisture fluxes are clustered under a separate indicator suggests a different sensitivity of these variables to the perturbed parameters. Since these three variables show significant correlations of variable changes to each other in all three scenarios, one could use any of them as the indicator for this cluster. The same is true of the next indicators and their clusters, which are “air-to-sea carbon flux” (F_carba2o) and “soil respiration” (L_soilresp); “net top-of-atmosphere radiation” (F_netrad) and the “ocean surface heat flux” (F_heat); and “net surface downward shortwave radiation” (F_dnswr) and the “land surface albedo” (A_albsurL).
The remaining single indicators are “air-to-land carbon flux” (F_carba2l), “ocean surface nitrate” (O_no3sur), “top-of-atmosphere outgoing longwave radiation” (F_outlwr), “ocean oxygen” (O_o2), “ocean surface alkalinity” (O_alksur), “ocean phytoplankton” (O_phyt), “sea surface salinity” (O_salsur), “ocean surface phosphate” (O_po4sur), and “maximum ocean meridional overturning” (O_motmax).
If stakeholders or experts were to inform the indicator selection process, it would be possible to prescribe indicators and then use the SCoMaE analysis to identify additional uncorrelated variables that are needed to obtain a comprehensive assessment of the system. Also, instead of using global mean time series, one could look at time series of regions or already processed variables, such as heat stress or cumulative emissions. This approach in combination with the SCoMaE analysis enables us to learn about variables which have previously been disregarded but potentially provide new information about the system or to learn which of the indicators previously considered actually provide redundant information.
How would the common indicator set from our example change if we were to
include the condition that surface air temperature should be the first
indicator, instead of atmospheric CO
Prescribing “surface air temperature” (A_sat) as the first indicator for the common correlation matrix leads to the replacement of “precipitation
over land” (F_precipL) by “precipitation over ocean” (F_precipO) as the
second indicator (Fig.
The third and fourth indicators are “net top-of-atmosphere radiation” (F_netrad) and “net surface downward shortwave radiation” (F_dnswr), which were found with the same underlying clusters in the default analysis (compare Figs. S8 and S9). Finally, eight of the nine previously identified single indicators remain unclustered and hence are still single indicators.
Although the total number of indicators has not changed, the identified clusters and their meaning differ: in the default analysis, the first indicator represented changes in temperatures, carbon fluxes, and global and oceanic moisture fluxes. If “surface air temperature” (A_sat) is prescribed, the global and oceanic moisture fluxes are moved to the second cluster, which in addition incorporates some Earth system variables from the previously identified second and third indicators. This is one example showing how the SCoMaE method allows for the inclusion of expert judgment or preconditions, is able to account for changes in correlation patterns, and allows one to determine which indicators are needed for a comprehensive and nonredundant assessment. (For more discussions, see Sect. 2 and Fig. S3 in the Supplement.)
As illustrated above, the SCoMaE method statistically evaluates the
correlations between changes in model output variables and uses this
information to cluster variables, while selecting a representative indicator
for each cluster. The example analyses of the individual scenarios illustrates
the dependence of the indicator selection on the imposed forcing scenario.
These results demonstrate that for our model, it is insufficient to apply the
historical indicator set to the future scenarios with either higher CO
We demonstrate one possible approach for selecting a more comprehensive indicator set by constructing a common correlation matrix to identify indicators that can be used for the assessment of all three scenarios. For the clusters of variables of the common indicator set, the correlations of variable changes remain significant even under different atmospheric carbon or land-use forcing.
However, one should always ask whether the identified clusters and indicators
are scientifically meaningful. For the common correlation matrix (as well as
the RCP8.5 scenario), the first indicator, “atmospheric CO
The second indicator, “precipitation over land” (F_precipL), represents
the variability of moisture fluxes on land and the associated cooling effect.
The fact that these processes are clustered under an indicator that is
distinct from global and oceanic moisture fluxes indicates different
underlying processes for these moisture fluxes, namely the influence of
biological transpiration. This process is directly affected by the parameter
perturbations concerning the sensitivity of transpiration to CO
Another identified cluster is “net top-of-atmosphere radiation” (F_netrad) and “ocean surface heat flux” (F_heat), which are directly linked in the model. Furthermore “net surface downward shortwave radiation” (F_dnswr) and “land surface albedo” (A_albsurL) are clustered, since changes in vegetation on land induced by the parameter perturbations influence both the surface albedo on land and the incoming shortwave radiation at the surface.
The “air-to-sea carbon flux” (F_carba2o) and “soil
respiration” (L_soilresp) are clustered together for all three scenarios
but show a negative correlation of variable changes in the historical
scenario and positive correlations of variable changes in the two RCP
scenarios, indicating a dependency on the atmospheric carbon concentrations.
The predominant parameterization for those correlations of variable changes
is one that affects the CO
In contrast, in the future, high-CO
Two clusters are identified in both future emission scenarios, namely “ocean
phytoplankton” (O_phyt), which is clustered with “ocean surface
phosphate” (O_po4sur) and “ocean surface nitrate” (O_no3sur), and “ocean
oxygen” (O_o2), which is clustered with “ocean surface alkalinity” (O_alksur)
(compare Figs. S6 and S7). These two clusters are only identified when
atmospheric CO
For our case study, we chose to assess the uncertainty of the biological
system towards increasing temperature and CO
It is important to stress the fact that the Earth system variables used in
our example are annual global integrals or means between two fixed points in
time. While our approach was sufficient to demonstrate the SCoMaE method, it
is important to mention that global integrals and means are not always
positively correlated to regional changes and, therefore, may misrepresent
regional responses. Furthermore, we are not assessing the detailed temporal
development of the model variables' response to changes in the climate state.
Instead, we investigate changes in the final simulated climate state imposed
by parameter perturbations, which are sensitive to CO
The construction of an individual or a common correlation matrix can be a useful tool for assessing the state of complex systems. Individual correlation matrices allow one to obtain an initial overview of relationships between the different system variables, whereas a common correlation matrix shows how changes in the state of a system, imposed by, e.g., varying forcing scenarios, influence these relationships. The SCoMaE method then allows us to cluster the variables, based on statistical considerations, to obtain a nonredundant indicator set to guide more detailed analysis.
However, in order for this to be useful one must carefully select, what information to include in the correlation matrix, which in turn strongly depends on the given research question. This can be illustrated by the implicit choices made for our example case study, where we regarded correlations of variable changes in globally averaged model output variables given various parameter perturbations. The first choice in this case study was to use global aggregates of the model output. However, if the research focus were set on, e.g., regional phenomena, the correlations for the matrix could also be constructed either between regional aggregates or based on the correlation strength for a given spatial pattern.
The second choice for the case study, was to regard correlations between changes in model output variables based on their reaction to a parameter perturbation under changing climate forcing. Instead of using model output, it is also possible to further process the data and calculate derivatives of the model output variables, such as heat stress or cumulative time series. On another note, using a model with higher internal variability, it would also be possible to regard temporal correlations of Earth system variables over a chosen time period. In contrast to the purely process-based parameter perturbations that we regarded in the case study, this would hold information about the timescales and temporal development of the model output variables, which, in turn, could indicate common underlying processes in the model. Additionally, if the considered time series showed higher internal variability, it might be conceivable to apply a specific temporal filter to the data before calculating the correlation matrix. This could allow the distinction between important processes on different timescales, from daily and seasonal to interannual or decadal.
In the following we want to discuss the contribution of the SCoMaE method to
achieve the three characteristics for indicator selection as introduced by
In our example the SCoMaE method is based on model data and hence does not
account for information about the statistical measurability of the identified
indicators. This makes it difficult to directly translate a model-based
indicator set to a “real-world” application. This is the case, for example, for
the first indicator in the historical scenario: “precipitation over ocean” (F_precipO).
The lack of long-term historical precipitation measurements
over the ocean
The third characteristic mentioned by
In this study we introduced a bottom–up, correlation-based approach to systematically identifying indicator sets for the assessment of complex systems. To demonstrate the SCoMaE method, we applied it to correlation matrices constructed with changes in Earth system variables of an intermediate-complexity Earth system model, with which we simulated three forcing scenarios. We were able to identify indicator sets for an assessment of the historical as well as for an intermediate–high and a business-as-usual future emission scenario. The comparison of the three correlation matrices yielded the opportunity to assess changes in correlations between changes in Earth system variables introduced by the imposed forcing. These changes in the correlation patterns also motivated a reevaluation of the selected indicator sets for the different scenarios. We show that it is not sufficient to apply the indicator set identified for the historical scenario to the intermediate–high nor to the business-as-usual future emission scenario. This result points to the fact that the classical procedure of ad hoc indicators, such as surface air temperature, may work well for certain environmental conditions or scenarios but possibly not as well for others. That is, the subjective choice of indicators may lead to unintended preferences in the interpretation of different scenarios. By combining the three scenarios into a common correlation matrix, we could identify correlations between changes in Earth system variables that are robust across the three forcing scenarios. Considering these correlations only enabled us to identify a common indicator set, which was scientifically consistent and would allow us to comparatively assess the three considered scenarios.
This case study is one example out of many possible applications of the correlation matrix and SCoMaE method. The construction of the correlation matrix can be adjusted to the respective research question, which makes the SCoMaE method a generic and flexible tool. An iterative application of the SCoMaE method offers the user the chance to comprehensively assess complex systems such as the Earth system, while including political, ethical and economical considerations, as well as measurability constrains.
The model data used to generate the figures will be made
available at
List of globally aggregated model output variables considered in this study.
The supplement related to this article is available online at:
NM, AO, and DPK conceived of and designed the experiments. DPK and NM implemented and performed the experiments. NM analyzed the data and wrote the manuscript with contributions from DPK and AO.
The authors declare that they have no conflict of interest.
The authors thank Wilfried Rickels, Martin Quaas, and Christian Baatz for their helpful comments, as well as the participants of the Metrics Workshop of the SPP 1689 in Hamburg in March 2015 for their thoughts on metrics and indicators. This work was funded by the DFG Priority Program “Climate Engineering: Risks, Challenges, Opportunities?” (SPP 1689). Edited by: Ben Kravitz Reviewed by: two anonymous referees