Steve McIntyre on NAS report


[CSPP Note:  McIntyre and McKitrick are the two who started the whole controversy over the accuracy and claims of the so-call temperature “hockey stick.”]


http://www.climateaudit.org/?p=715#more-715

 HYPERLINK "http://www.climateaudit.org/?p=715" NAS Panel Report

The early rumors on the NAS Panel was that it was “two handed” – on the one hand, …, on the other hand, … with something for everyone. I’d characterize it more as schizophrenic. It’s got two completely distinct personalities. On the one hand, they pretty much concede that every criticism of MBH is correct. They disown MBH claims to statistical skill for individual decades and especially individual years. 

However, they nevertheless conclude that it is “plausible” – whatever that means – that the “Northern Hemisphere was warmer during the last few decades of the 20th century than during any comparable period over the preceding millennium”. Here, the devil is in the details, as the other studies relied on for this conclusion themselves suffer from the methodological and data problems conceded by the panel. The panel recommendations on methodology are very important; when applied to MBH and the other studies (as they will be in short order), it is my view that they will have major impact and little will be left standing from the cited multiproxy studies. 

Update: Eduardo Zorita’s take posted up below was:

In my opinion the Panel adopted the most critical position to MBH nowadays possible. I agree with you that it is in many parts ambivalent and some parts are inconsistent with others. It would have been unrealistic to expect a report with a summary stating that MBH98 and MBH99 were wrong (and therefore the IPC TAR had serious problems) when the Fourth Report is in the making. I was indeed surprised by the extensive and deep criticism of the MBH methodology in Chapters 9 and 11 . 

I thought that the tone of the question period showed that some reporters were pretty unsettled - there were questions about the "over-selling" of MBH with the panel taking pains to suggest that IPCC would be repsonsible rather than MBH (conveniently omitting that Mann was section author of the section promoting MBH and in his capacity of IPCC author, ratcheted up the statistical claims) ; there was discussion of what "plausible" meant, with a reporter wondering if this was "damning with faint praise". 

You can find the report and the recorded briefing  HYPERLINK "http://www.nationalacademies.org/morenews/20060622.html" here.

Overview 

In the preface, North summarizes the criticisms: 

Critics of the original papers have argued that the statistical methods were flawed, that the choice of data was biased, and that the data and procedures used were not shared so others could verify the work. (ix) 

He left out the criticism that concerned the Barton Committee and launched the entire matter – that adverse results were withheld or even misrepresented. In its text, the panel concedes every one of our criticisms of the statistical methods, providing some useful new guidelines. However, they do not apply these guidelines to either to MBH or to other studies. 

They do not clearly discuss biased data selection, but concede that strip-bark samples, such as bristlecones, which we had strongly criticized, “should be avoided in temperature reconstructions”. However, they then proceed to rely on studies that rely on strip-bark bristlecones (and foxtails) and even the criticized MBH PC1 (which is even illustrated in an alter ego in Figure 11-2.)

They do not grasp the nettle of reporting on previous data and method availability, but do endorse the principle that sharing data and methods is a good thing in paleoclimate. Schizophrenically, their graphics and conclusions rely heavily on studies where data and/or methods are not available. 

They stay well away from grasping the nettle of providing an opinion on whether adverse MBH results were withheld or misrepresented. However, they report factual findings that MBH failed cross-validation tests and was not robust to presence/absence of all dendroclimatic indicators, contrary to prior claims of Mann et al. 

Flawed Statistical Methods 

On p 107, the panel reports our two principal criticisms of MBH statistical methods, finding 

“Some of these criticisms are more relevant than others, but taken together, they are an important aspect of a more general finding of this committee, which is that uncertainties of the published reconstructions have been underestimated. Methods for evaluation of uncertainties are discussed in Chapter 9.” 

Chapter 9 then sets out some important guidelines, dealing with several critical issues that we raised in our presentation: that it is inadequate to just consider one statistic in assessing a statistical model; that confidence interval calculations should use verification period residuals rather than calibration period residuals; that autocorrelation should be considered in calculating confidence intervals. 

The panel’s schizophrenia is very evident here, because, having set out these methods, they do not apply these methods to the models in front of them. D’Arrigo et al 2006 report that their model does not verify after 1985 during the period of warming of most direct interest. The panel was aware of this, the matter came up in presentations, but did not directly report or discuss this. 

The panel recommends the use of a Durbin-Watson statistic for calibration, but do not report the failure of the various models under this statistic, even though they were aware of this failure. (We presented this information to them in our presentation. 

Choice of Data 

If you try to trace how the panel considered criticisms about biases in the choice of data by searching the word “bias”, you will find that the panel simply did not report on the matter. The closest is a mention on p. 106 to our criticism of the “selection of proxies, especially the bristlecone pine data, used in some of the original temperature reconstruction studies”, which they promise to ”explore briefly in the remainder of the chapter”, a promise which is unfulfilled. 

They state (p. 107) that 

“several recent research efforts have explored how the selection of proxies affects surface temperature reconstructions”

but then, in the next sentence, go on to discuss a totally unrelated study and never return to the issue. They agree (p 107) that 

“ the Mann et al. (1999) reconstruction that uses this particular principal component analysis technique is strongly dependent on data from the Great Basin region in the western United States.”

(“Data from the Great Basin region” is code here for bristlecones) 

They go on to state that 

“…such issues of robustness need to be taken into account in estimates of statistical uncertainties.” 

However, in their statistical chapter, they do not address how such robustness issues should be “taken into account” in the estimation of statistical uncertainty, although we may presume that it would increase them. 

We note that the panel had previously (p. 86) agreed entirely with our criticism of Mann’s principal components method, concluding that the “baseline with respect to which anomalies are calculated can influence principal components in unanticipated ways” and stated (p. 106) that the Mann method is “not recommended”. 

They linked this criticism (p 107) to the overweighting of bristlecones and noted that “the more important aspect of this criticism is the issue of robustness with respect to the choice of proxies used in the reconstruction”, a point with which we agree. Indeed, it is a point specifically made in McIntyre and McKitrick [E&E 2005], a publication not cited by the panel. 

In their section on tree rings (p. 50), they had previously discussed bristlecones, citing many references that we provided them. They report that strip-bark bristlecones are “sensitive to higher atmospheric CO2 concentrations” and state that “‘strip-bark’ samples should be avoided for temperature reconstructions” – an even stronger position than calling for non-robustness to be “taken into account” in some method that they do not describe. 

Here’s where their typical schizophrenia sets in. 

In order to decide on the “plausibility” of late 20th century uniqueness, their key graphics (Figure O-5, S-1, 11-1, 11-2) use reconstructions from Mann and Jones 2003, Hegerl et al 2006, Esper et al 2002 and Moberg et al 2005 and a collection of proxies from Osborn and Briffa 2006. 

Remarkably, Mann’s “not recommended” PC1 using the strip-bark bristlecones that “should not be used” crops up at the very top of Figure 11-2, innocuously labeled “W.USA (regional)”. For good measure, another strip-bark foxtail (interrelated to bristlecones) series occurs in Figure 11-2 labeled this time “W.USA (Boreal/Upperwright)”. Thus the most disputed series make up 2 of the 14 series in Osborn and Briffa 2006 and, not coincidentally, the two most HS-shaped series. In the graphics illustrating a supposed generic similarity, Mann and Jones 2003, of course, uses Mann’s PC1. 

At this point, I do not know for sure what sites are in Hegerl et al 2006, as the article itself does not disclose this information. Last fall, when I asked the authors of the then unpublished paper to identify the sites in connection with IPCC peer review, the authors refused and IPCC threatened to expel me as a reviewer if I made a further attempt to obtain data or information from an author of unpublished studies. However, I have reasons to believe that, like Osborn and Briffa, Hegerl et al 2006 will include both Mann’s “not recommended” PC1 and a strip-bark foxtail series, probably the identical series used in Osborn and Briffa 2006. Esper et al 2002 includes two strip-bark foxtail series: the Boreal and Upper Wright sites, which are combined as one average in Osborn and Briffa 2006 (and probably Hegerl et al 2006). Moberg et al 2005 includes 3 different bristlecone sites (although, in this case, unlike the others, they are not the “active ingredient”). 

Thus, 3 of the 4 illustrated reconstructions as well as the key Figure 11-2 all schizophrenically rely on proxies not meeting criteria set out elsewhere in the report. 

Yamal Substitution 

While the panel mentions our concern over biased choice of series, they did not discuss an egregious situation that we presented to them. In 1998, updated sampling at the Polar Urals site resulted in a series with elevated MWP values, as opposed to previous results used in earlier multiproxy studies which had low MWP values at this site. At about the same time, a site about 100 miles away (Yamal) was developed which had a very pronounced HS-shape. This site was substituted for the Polar Urals site in all but one multiproxy reconstruction. 

It is used for example in Osborn and Briffa 2006 -see Figure 11-2 where it is labeled “NW Russia (Yamal)”. If the Polar Urals update (in the Esper et al 2002) version is used instead of the Yamal series, then the conclusions of the relative levels of the MWP and modern periods are altered in the Briffa 2000 reconstruction (www.climateaudit.org) and almost certainly in the D’Arrigo et al 2006 reconstruction which has a virtually identical roster. 

The panel noted in passing that results were sensitive in these small subsets, but did not squarely address the issue and inconsistently stated the opposite on one occasion in the report. 

Data Availability 

Despite a specific request from the Boehlert committee to comment on availability of data, we predicted that the panel would merely espouse generalities here and this has proved to be the case. 

They did not discuss either problems with Mann et al or other studies and merely espoused a few sentences of platitudes encouraging data sharing, platitudes previously expressed by a previous NAS Panel in 1995, in a section entitled: What Comments Can Be Made On The Value Of Exchanging Information And Data? 

They stated platitudinously: 

Our view is that all research benefits from full and open access to published datasets and that a clear explanation of analytical methods is mandatory. Peers should have access to the information needed to reproduce published results, so that increased confidence in the outcome of the study can be generated inside and outside the scientific community. Other committees and organizations have produced an extensive body of literature on the importance of open access to scientific data and on the related guidelines for data archiving and data access (e.g., NRC 1995). Paleoclimate research would benefit if individual researchers, professional societies, journal editors, and funding agencies continued 

Obviously, we do not disagree with this. In this context, we observe that we have attempted to obtain data for both D’Arrigo et al 2006 and Hegerl et al 2006, both cited and relied on by the panel and were unsuccessful. 

In the last few months, a limited amount of measurement data has been archived by the D’Arrigo group, but only a fraction. 

With respect to Osborn and Briffa, Science has refused to require the authors to disclose the measurement data for Yamal, Tornetrask, Taimyr and Alberta sites on the grounds that the earlier data was from Briffa (2000). Osborn and Briffa have refused to provide the earlier measurement data. Even the identity of the sites used in the Briffa density study remain undisclosed and the authors have refused several requests to identify the sites. 

The panel cites ice core data from Thompson. Prior to our initiative in 2004, no information had been archived from any of the Himalayan sites, including core drilled in 1987. A cursory archive was provided in 2004, in which no chemical information was provided and no sample details. 

Since several different, inconsistent and unreconciled “grey” versions of the Dunde series were floating around, there is a pressing need to examine individual sample information. Again, we have been unsuccessful in attempts with Science magazine to get access to detailed information. 

We will of course be asking the National Academy of Sciences to provide us with the data which they have relied on and which has been previously refused to us. 

Withholding Adverse Results 

The questions that launched the inquiry were the original Barton questions to Mann et al. about the withholding of adverse results, and, in particular, the withholding of adverse verification statistics (the r2 statistic) and the misrepresentation of robustness to the presence/absence of “all” dendroclimatic indicators, let alone bristlecones. 

One of the panelists asked Mann about the verification r2 statistic: Mann said that “he did not calculate the verification r2 statistic – that would be a silly and incorrect thing to do”. The statement that the verification r2 was not calculated was untrue and the panel had evidence of this. The panel did not agree that this would be “silly and incorrect” since they endorsed the use of the closely related CE statistic for estimation of confidence intervals. 

However, they did not grasp the nettle of whether the adverse verification r2 statistics had been calculated and withheld. They did report that the MBH reconstruction had extremely low r2 and CE values prior to the instrumental period, citing Wahl and Ammann (but not us, although we had originally made the point.) They also acknowledged that the MBH reconstruction was “strongly dependent” on bristlecones, a point obviously inconsistent with previous MBH claims of “robustness” to the presence/absence of all dendroclimatic indicators. 

Again they do not grasp the nettle of trying to reconcile the facts to the prior claims. 

Other Points 

The report is long and interesting point and obviously this is a very quick comment. Here are a few other points caught my eye. 

We commended Naurzbaev et al 2004 (including MBH coauthor Hughes) as an excellent example of a strategy for millennial climate reconstruction. 

The panel commended the strategy of this study as follows: 

An especially suitable strategy to minimize confounding effects is to sample sites along ecological gradients, such as elevation or latitude (Fritts and Swetnam 1989, Bugmann 1996). For example, (Naurzbaev et al. 2004) selected sites along latitudinal (from 55 to 72°N) and elevational (from 1120 to 2350 m above sea level) transects, and used the parameters of the Regional Curve Standardization to infer climatic influences and past temperature variability. 

However, they didn’t report that Naurzbaev’s conclusions from this “especially suitable strategy” was that the MWP was 2-3 deg C warmer than the 20th century or make any attempt to reconcile this finding with the Yamal reconstruction that they display in Figure 11-2. 

The "Divergence Problem" falls into the category of a ”main area of uncertainty” . It arises because the majority of temperature-sensitive tree ring width "site chronologies" go down in the last half of the 20th century. Cuffey asked the $64 question to D’Arrigo: if tree rings are not picking up late 20th century warmth, how can you be certain that they might not have had a similar response to a comparable warm period in the past (e.g. in the Medieval Warm Period). The NAS panel acknowledges the “divergence problem” (p 47, 110), but doesn’t summarize it as one of the “major areas of uncertainty”. 

They report: 

(111) The observed discrepancy between some tree ring variables that are thought to be sensitive to temperature and the temperature changes observed in the late 20th century (Jacoby and D’Arrigo 1995, Briffa et al. 1998) reduces confidence that the correlation between these proxies and temperature has been consistent over time. Future work is needed to understand the cause of this “divergence,” which for now is considered unique to the 20th century and to areas north of 55°N (Cook et al. 2004)… also that the difference between northern and southern sites found after about 1950 is unprecedented since at least A.D. 900. 

Relying on the “southern sites” for firm ground here is going to be difficult as these “southern sites” in Cook et al 2004 include the bristlecones. 

Here Bunn et al 2003 (as Graybill and Idso 1993 had before him) found a different “divergence” problem – between strip-bark and full-bark sites, which is considered to be related to fertilization. If one peels beneath the surface of Cook et al 2004, one will find that strip-bark sites - which are “not to be used” as a temperature proxy – have been relied upon to supposedly reconcile the “divergence problem”. 

More frustratingly, the panel failed to report the impact of the “divergence problem” on proxy reconstructions in the warm 1980s, including the validation failure of D’Arrigo et al. 

Conclusion 

It will take a while to assess the impact of this study. It’s long and interesting. One thing that appears certain: far from ending the controversy over millennial climate studies, it looks to me like it is merely one more step.