To manage the uncertainty in the data and rules ECliPS uses confidence factors (CFs). A CF is a numerical measure of the confidence one has in the validity of a given evidence or rule. It varies from 0 (no confidence at all) to 100 (complete confidence).
In the case of data sets, a CF is similar to a degree of membership introduced by Zadeh (1965) for fuzzy sets. To illustrate the concept of fuzzy sets, consider a statement "winter is cold." A crisp set description requires an arbitrary decision as to what constitutes cold winter. Is it one when mean winter temperature drops below 5C, 10C or 20C? There are winters when everybody agrees that it is cold and there are winters that one cannot be so certain. The figure below shows one possible membership function for "cold winter".
If temperature drops below the point marked as A, then the membership function equals 1, i.e., there is a consensus that the winter is cold. If temperature rises above point B, then there is no one who would call the winter cold. And, finally, when temperature is between A and B, the opinion is mixed. In ECliPS, if the direct measurements are available, point A often coincides with 0.7 standard deviation of temperature, point B is its long-term mean or normal value, and between A and B the value of the membership function is determined by a linear interpolation.
In reality, however, the situation is somewhat more complex. For example, there are different indices that measure the strength of El Niño events, and the membership function may vary depending which index is used. Moreover, there may be no direct measurements for a climate variable, just some proxy information to characterize it. This brings an additional element of uncertainty regarding our statements about such variables. Therefore, we interpret the degree of membership for our data as CFs that reflect not only their deviation from normal but how certain our information about them is.
Confidence factors are also used to measure the degree of uncertainty in rules. Most of the rules in ECliPS are based in information that comes from empirical studies, which provide a large body of circumstantial evidence about the climate system. Many of those works are case studies involving a combination of synoptic, statistical, and conceptual physical reasoning. This knowledge is often presented in the form of production (IF-THEN) rules that links evidence (e) and hypothesis (h):
IF e THEN h, CF = [0…100].
Typically, the hypothesis is a statement regarding a category of the forecast (or target) variable (e.g., h: temperature = above normal). The CF in this case represents a subjective probability, or "guesstimate," that if e is true, then h is true as well. A numerical value of the CF is based upon experience and available information about the relationship between climatic processes presented in the rule. If the quantitative data is available, conventional statistical techniques (for example, correlation analysis) may also be used to determine the strength of a relationship between climate variables, and thereby provide an estimate of the CF. 
In assigning CFs to the rules, it is very important to maintain consistency in their values throughout the knowledge base. If the correlation analysis is used, the correlation coefficient (r) is converted to the CF as follows:
CF = k (r – CL),
where k is the correction coefficient based on the number of observations, and CL is the 95% confidence level determined using the Student’s t-test. The value for k is taken from the normal cumulative distribution with the mean of and standard deviation of 6 as in this figure. For example, in a sample of 15 pairs of data and r = 0.8, CF = 69 * (0.80 - 0.51) = 20.
The CF value should be reduced if there is a serial correlation in the data and/or the distribution is not normal. The CF may also reflect the quality of the data, the accuracy the statistical analysis, how convincing its interpretation is, and maybe just the gut feeling of the users. In the case of categorical variables, the probabilities or odds may be converted into CFs as discussed in the section CFs and Probabilities.
The CF for hypothesis h in a rule is a product of CFs for the evidence and rule divided by 100. Now, suppose we have two rules for the same hypothesis h, with the confidence factors CF1 = 70 and CF2 = 30, respectively. We also assume that the information sources for the evidence in these rules are independent. Then the combined confidence factor (CFcomb) is calculated as follows:
CFcomb = CF1 + CF2 – (CF1*CF2)/100 = 79.
When another rule becomes available that is related to the same hypothesis, with, say, CF3 = 30, the CF for the hypothesis is recalculated:
CFcombnew = CF3 + CFcomb – (CF3 * CFcomb)/100 = 85.
This is continued until all the evidence for h is combined.
Generally speaking, the CF is the difference between belief and disbelief:
CF(h, e) = MB(h, e) - MD(h, e), expressed in per cent.
Here CF is the confidence factor in the hypothesis h due to evidence e, MB is the measure of increased belief in h due to e, and MD is the measure of increased disbelief in h due to e.
In most cases disbelief is simply the opposite of belief, and therefore using CFs is equivalent of using probabilities, for which the following is always true:
P(h) = 1 - P(¬h),
where P(¬h) is the probability of any hypothesis other than h.
For the case of a posterior hypothesis that relies on evidence e
P(h| e) = 1 - P(¬h| e).
The fundamental problem here is that while P(h| e) implies a cause-and-effect relationship between e and h, there may be no cause-and-effect relationship between e and ¬h. For example, it is known that El Niño events tend to be associated with anomalously warm winters in the Great Lakes region (Rodionov and Assel, 2003). The probability of a cold winter, however, does not appear to be affected by the processes in the equatorial Pacific, so that the above equation is not true. This type of nonlinear, or asymmetric, relationships between climatic variables attracts more and more attention in climate research (e.g., Wu and Hsieh, 2004).
When many pieces of evidence are combined, the sum of the CFs for all categories of the forecast variable is not necessarily equal 100. Therefore, CFs are not probabilities, although in most cases for individual rules they can be converted into probabilities, and vice versa.
The measures of belief and disbelief defined in terms of probabilities also include prior, or climatological, probability P(h). If P(h) = 1, then MB(h, e) = MD(h, e) = 1, otherwise
MB(h, e) = (max [P(h| e), P(h)] - P(h))/ (1 - P(h)),
MD(h, e)= (min [P(h| e), P(h)] - P(h))/ (- P(h)).
For example, let’s assume that in some region the climatological probabilities of a cold and warm winter are P (cold) = 0.6 and and P (warm) = 0.4. Suppose there is a factor e that may or may not affect the probability of occurrences of cold or warm winters. The data shows that out of 16 winters when e was observed, 8 were warm and 8 cold. Then, according to the above formulas,
MB(warm, e) = (0.6 – 0.6)/0.4 = 0,
MD(warm, e) = (0.5 – 0.6)/(-0.5) = 0.2, and
CF = -20.
In this example, equal probabilities of e increase our disbelief in h: winter = warm. In our analysis, however, we do not use negative CFs. Instead, we calculate CFs for the opposite event, i.e., CF (cold, e) = 20.
In the next example, let’s assume that the climatological probabilities of a cold or warm winter are the same, i.e., P(cold) = P (warm) = 0.5. Suppose that the data shows that out of 16 winters when e was observed, 12 were warm and 4 cold. Then
MB(warm, e) = (0.75 – 0.5)/0.5 = 0.5,
MD(warm, e) = 0, and
CF = 50.
Since the evidence e does not support the hypothesis “winter = cold”, CF (cold, e) = 0. The value for MB(warm, e) in this example can also be obtained as
MB(warm, e) = p(h| e) – p(¬h| e) = 0.75 - 0.25 = 0.5.
The data also shows that the odds (O) of warm versus cold winter are 12:4 or 3:1. Odds can be converted into probability using the following formula:
P(warm, e) = O/(1+O) = 0.75.
In their forecasts, the U.S. Climate Prediction Center uses the so-called “probability of exceedance" (Pexc), which in our example will be
Pexc = P(warm, e) – P(warm) = 0.25.
All this gives us a better understanding of what CF = 50 means in more familiar probabilistic measures of uncertainty. This value of CF, however, needs to be adjusted for the number of observations in the same manner as discussed above for the CF values based the correlation. It may be further adjusted to reflect the quality of data, reliability of sources of information, accuracy of the analysis, etc. All in all, the CF reflects both the objective and subjective confidence in h.
After all the evidence is collected, the hypothesis (i.e., the category of the target variable) with the highest CF is forecast. Most often the CF for this hypothesis is treated as MB, while the CF for the opposite hypothesis as MD, and the final CF is the difference between the two. In some cases, however, all the CFs are shown in the forecast.
The forecast is also accompanied by the odds to give the user some sense how much money we are willing to bet in favor of the forecast category. The odds are calculated as
O = (100 + CF)/(100 - CF).
The odds may also be adjusted for some subjective reason. The relationship between O and CF is shown in the figure below.
With these odds the user will have equal chances of winning or losing money. Therefore placing a bet, the user is advised to be a bit more conservative and bet, say, 2:1 instead of 3:1.
As shown in the previous sections, CFs are a simple and convenient way to manage the uncertainty in the climate system. Still, there is also some uncertainty (or imprecision) in assignment of CFs to evidence and rules. In practice, however, it turns out that the knowledge content of rules is much more important than the algebra of confidences that holds the system together.
In conclusion, we would like to underscore that the forecasts we produce are categorical and the CFs reflect our confidence that the forecast category of the target variable will occur. Nevertheless, in many cases (but not always), a high CF also means a high magnitude of the anomaly. This is particularly common for two opposite categories, such as warm-cold or dry-wet. For this and other reasons (short time series, state of our knowledge about the climate system, etc.) we prefer to work with two or three categories of climate variables.
Rodionov, S. and R. Assel, 2003: Winter severity in the Great Lakes region: a tale of two oscillations, Climate Research, 24, 19-31.
Wu, A. and W. W. Hsieh, 2004: The nonlinear Northern Hemisphere winter atmospheric response to ENSO, Geophys. Res. Lett., 31, L02203, doi:DOI 10.1029/2003GL018885.
Zadeh, L. A., 1965: Fuzzy sets, Information and Control, 8, 338-353.