Does HansenÕs Error ÒMatterÓ?

 

ClimateAudit.org, August 11, 2007

http://www.climateaudit.org/?p=1885

 

By Steve McIntyre

 

ThereÕs been quite a bit of publicity about HansenÕs Y2K error and the change in the U.S. leaderboard (by which 1934 is the new warmest U.S. year) in the right-wing blogosphere. In contrast, realclimate has dismissed it a triviality and the climate blogosphere is doing its best to ignore the matter entirely.

 

My own view has been that matter is certainly not the triviality that Gavin Schmidt would have you believe, but neither is it any magic bullet. I think that the point is significant for reasons that have mostly eluded commentators on both sides.

 

Station Data

First, letÕs start with the impact of HansenÕs error on individual station histories (and my examination of this matter arose from examination of individual station histories and not because of the global record.) GISS provides an excellent and popular online service for plotting temperature histories of individual stations. Many such histories have been posted up in connection with the ongoing examination of surface station quality at surfacestations.org. HereÕs an example of this type of graphic:

 

 

Figure 1. Plot of Detroit Lakes MN using GISS software

 

But itÕs presumably not just Anthony Watts and surfacestations.org readers that have used these GISS station plots; presumably scientists and other members of the public have used this GISS information. The Hansen error is far from trivial at the level of individual stations. Grand Canyon was one of the stations previously discussed at climateaudit.org in connection with Tucson urban heat island. In this case, the Hansen error was about 0.5 deg C. Some discrepancies are 1 deg C or higher.

 

Figure 2. Grand Canyon Adjustments

 

Not all station errors lead to positive steps. There is a bimodal distribution of errors reported earlier at CA here, with many stations having negative steps. There is a positive skew so that the impact of the step error is about 0.15 deg C according to Hansen. However, as you can see from the distribution, the impact on the majority of stations is substantially higher than 0.15 deg. For users of information regarding individual stations, the changes may be highly relevant.

 

GISS recognized that the error had a significant impact on individual stations and took rapid steps to revise their station data (and indeed the form of their revision seems far from ideal indicating the haste of their revision.) GISS failed to provide any explicit notice or warning on their station data webpage that the data had been changed, or an explicit notice to users who had downloaded data or graphs in the past that there had been significant changes to many U.S. series. This obligation existed regardless of any impact on world totals.

 

Figure 3. Distribution of Step Errors

 

 

GISS has emphasized recently that the U.S. constitutes only 2% of global land surface, arguing that the impact of the error is negligible on the global averagel. While this may be so for users of the GISS global average, U.S. HCN stations constitute about 50% of active (with values in 2004 or later) stations in the GISS network (as shown below). The sharp downward step in station counts after March 2006 in the right panel shows the last month in which USHCN data is presently included in the GISS system. The Hansen error affects all the USHCN stations and, to the extent that users of the GISS system are interested in individual stations, the number of affected stations is far from insignificant, regardless of the impact on global averages.

 

Figure 4. Number of Time Series in GISS Network. This includes all versions in the GISS network and exaggerates the population in the 1980s as several different (and usually similar) versions of the same data are often included.

 

 

U.S. Temperature History

The Hansen error also has a significant impact on the GISS estimate of U.S. temperature history with estimates for 2000 and later being lowered by about 0.15 deg C (2006 by 0.10 deg C). Again GISS moved quickly to revise their online information changing their US temperature data on Aug 7, 2007. Even though Gavin Schmidt of GISS and realclimate said that changes of 0.1 deg C in individual years were ÒsignificantÓ, GISS did not explicitly announce these changes or alert readers that a ÒsignificantÓ change had occurred for values from 2000-2006. Obviously they would have been entitled to observe that the changes in the U.S. record did not have a material impact on the world record, but it would have been appropriate for them to have provided explicit notice of the changes to the U.S. record given that the changes resulted from an error.

 

The changes in the U.S. history were not brought to the attention of readers by GISS itself, but in this post at climateaudit. As a result of the GISS revisions, there was a change in the Òleader boardÓ and 1934 emerged as the warmest U.S. year and more warm years were in the top ten from the 1930s than from the past 10 years. This has been widely discussed in the right-wing blogosphere and has been acknowledged at realclimate as follows:

 

The net effect of the change was to reduce mean US anomalies by about 0.15 ¼C for the years 2000-2006. There were some very minor knock on effects in earlier years due to the GISTEMP adjustments for rural vs. urban trends. In the global or hemispheric mean, the differences were imperceptible (since the US is only a small fraction of the global area).

 

There were however some very minor re-arrangements in the various rankings (see data). Specifically, where 1998 (1.24 ¼C anomaly compared to 1951-1980) had previously just beaten out 1934 (1.23 ¼C) for the top US year, it now just misses: 1934 1.25¼C vs. 1998 1.23¼C.  None of these differences are statistically significant.

 

In my opinion, it would have been more appropriate for Gavin Schmidt of GISS (who was copied on the GISS correspondence to me) to ensure that a statement like this was on the caption to the U.S. temperature history on the GISS webpage, rather than after the fact at realclimate.

 

Obviously much of the blogosphere delight in the leader board changes is a reaction to many fevered press releases and news stories about year x being the Òwarmest yearÓ. For example, on Jan 7, 2007, NOAA announced that:

 

The 2006 average annual temperature for the contiguous U.S. was the warmest on record.

 

This press release was widely covered as you can determine by googling Òwarmest year 2006 united statesÓ. Now NOAA and NASA are different organizations and NOAA, not NASA, made the above press release, but members of the public can surely be forgiven for not making fine distinctions between different alphabet soups. I think that NASA might reasonably have foreseen that the change in rankings would catch the interest of the public and, had they made a proper report on their webpage, they might have forestalled much subsequent criticism.

 

In addition, while Schmidt describes the changes atop the leader board as Òvery minor re-arrangementsÓ, many followers of the climate debate are aware of intense battles over 0.1 or 0.2 degree (consider the satellite battles.) Readers might perform a little thought experiment: suppose that Spencer and Christy had published a temperature history in which they claimed that 1934 was the warmest U.S. year on record and then it turned out that they had been a computer programming error opposite to the one that Hansen made, that Wentz and Mears discovered there was an error of 0.15 deg C in the Spencer and Christy results and, after fiixing this error, it turned out that 2006 was the warmest year on record. Would realclimate simply describe this as a Òvery minor re-arrangementÓ?

 

So while the Hansen error did not have a material impact on world temperatures, it did have a very substantial impact on U.S. station data and a ÒsignificantÓ impact on the U.S. average. Both of these surely ÒmatterÓ and both deserved formal notice from Hansen and GISS.

 

Can GISS Adjustments ÒFixÓ Bad Data?

Now my original interest in GISS adjustments did not arise abstractly, but in the context of surface station quality. Climatological stations are supposed to meet a variety of quality standards, including the relatively undemanding requirement of being 100 feet (30 meters) from paved surfaces. Anthony Watts and volunteers of surfacestations.org have documented one defective site after another, including a weather station in a parking lot at the University of Arizona where MBH coauthor Malcolm Hughes is employed, shown below.

 

Figure 5. Tucson University of Arizona Weather Station

 

These revelations resulted in a variety of aggressive counter-attacks in the climate blogosphere, many of which argued that, while these individual sites may be contaminated, the ÒexpertÓ software at GISS and NOAA could fix these problems, as, for example here.

 

they [NOAA and/or GISS] can ÒfixÓ the problem with math and adjustments to the temperature record.

 

or here:

 

This assumes that contaminating influences canÕt be and arenÕt being removed analytically.. I havenÕt seen anyone saying such influences shouldnÕt be removed from the analysis. However I do see professionals saying ÒweÕve done itÓ

 

ÒFixingÓ bad data with software is by no means an easy thing to do (as witness MannÕs unreported modification of principal components methodology on tree ring networks.) The GISS adjustment schemes (despite protestations from Schmidt that they are Òclearly outlinedÓ) are not at all easy to replicate using the existing opaque descriptions. For example, there is nothing in the methodological description that hints at the change in data provenance before and after 2000 that caused the Hansen error. Because many sites are affected by climate change, a general urban heat island effect and local microsite changes, adjustment for heat island effects and local microsite changes raises some complicated statistical questions, that are nowhere discussed in the underlying references (Hansen et al 1999, 2001). In particular, the adjustment methods are not techniques that can be looked up in statistical literature, where their properties and biases might be discerned. They are rather ad hoc and local techniques that may or may not be equal to the task of ÒfixingÓ the bad data.

 

Making readers run the gauntlet of trying to guess the precise data sets and precise methodologies obviously makes it very difficult to achieve any assessment of the statistical properties. In order to test the GISS adjustments, I requested that GISS provide me with details on their adjustment code. They refused. Nevertheless, there are enough different versions of U.S. station data (USHCN raw, USHCN time-of-observation adjusted, USHCN adjusted, GHCN raw, GHCN adjusted) that one can compare GISS raw and GISS adjusted data to other versions to get some idea of what they did.

 

In the course of reviewing quality problems at various surface sites, among other things, I compared these different versions of station data, including a comparison of the Tucson weather station shown above to the Grand Canyon weather station, which is presumably less affected by urban problems. This comparison demonstrated a very odd pattern discussed here. The adjustments show that the trend in the problematic Tucson site was reduced in the course of the adjustments, but they also showed that the Grand Canyon data was also adjusted, so that, instead of the 1930s being warmer than the present as in the raw data, the 2000s were warmer than the 1930s, with a sharp increase in the 2000s.

 

Figure 6. Comparison of Tucson and Grand Canyon Versions

 

 

Now some portion of the post-2000 jump in adjusted Grand Canyon values shown here is due to HansenÕs Y2K error, but it only accounts for a 0.5 deg C jump after 2000 and does not explain why Grand Canyon values should have been adjusted so much. In this case, the adjustments are primarily at the USHCN stage. The USHCN station history adjustments appear particularly troublesome to me, not just here but at other sites (e.g. Orland CA). They end up making material changes to sites identified as ÒgoodÓ sites and my impression is that the USHCN adjustment procedures may be adjusting some of the very ÒbestÓ sites (in terms of appearance and reported history) to better fit histories from sites that are clearly non-compliant with WMO standards (e.g. Marysville, Tucson). There are some real and interesting statistical issues with the USHCN station history adjustment procedure and it is ridiculous that the source code for these adjustments (and the subsequent GISS adjustments - see bottom panel) is not available/

 

Closing the circle: my original interest in GISS adjustment procedures was not an abstract interest, but a specific interest in whether GISS adjustment procedures were equal to the challenge of ÒfixingÓ bad data. If one views the above assessment as a type of limited software audit (limited by lack of access to source code and operating manuals), one can say firmly that the GISS software had not only failed to pick up and correct fictitious steps of up to 1 deg C, but that GISS actually introduced this error in the course of their programming.

 

According to any reasonable audit standards, one would conclude that the GISS software had failed this particular test. While GISS can (and has) patched the particular error that I reported to them, their patching hardly proves the merit of the GISS (and USHCN) adjustment procedures. These need to be carefully examined. This was a crying need prior to the identification of the Hansen error and would have been a crying need even without the Hansen error.

 

One practical effect of the error is that it surely becomes much harder for GISS to continue the obstruction of detailed examination of their source code and methodologies after the embarrassment of this particular incident. GISS itself has no policy against placing source code online and, indeed, a huge amount of code for their climate model is online. So itÕs hard to understand their present stubbornness.

 

The U.S. and the Rest of the World

Schmidt observed that the U.S. accounts for only 2% of the worldÕs land surface and that the correction of this error in the U.S. has Òminimal impact on the world dataÓ, which he illustrated by comparing the U.S. index to the global index. IÕve re-plotted this from original data on a common scale. Even without the recent changes, the U.S. history contrasts with the global history: the U.S. history has a rather minimal trend if any since the 1930s, while the ROW has a very pronounced trend since the 1930s.

 

Re-plotted from GISS Fig A and GFig D data.

 

These differences are attributed to ÒregionalÓ differences and it is quite possible that this is a complete explanation. However, this conclusion is complicated by a number of important methodological differences between the U.S. and the ROW. In the U.S., despite the criticisms being rendered at surfacestations.org, there are many rural stations that have been in existence over a relatively long period of time; while one may cavil at how NOAA and/or GISS have carried out adjustments, they have collected metadata for many stations and made a concerted effort to adjust for such metadata. On the other hand, many of the stations in China, Indonesia, Brazil and elsewhere are in urban areas (such as Shanghai or Beijing). In some of the major indexes (CRU,NOAA), there appears to be no attempt whatever to adjust for urbanization. GISS does report an effort to adjust for urbanization in some cases, but their ability to do so depends on the existence of nearby rural stations, which are not always available. Thus, ithere is a real concern that the need for urban adjustment is most severe in the very areas where adjustments are either not made or not accurately made.

 

In its consideration of possible urbanization and/or microsite effects, IPCC has taken the position that urban effects are negligible, relying on a very few studies (Jones et al 1990, Peterson et al 2003, Parker 2005, 2006), each of which has been discussed at length at this site. In my opinion, none of these studies can be relied on for concluding that urbanization impacts have been avoided in the ROW sites contributing to the overall history.

 

One more story to conclude. Non-compliant surface stations were reported in the formal academic literature by Pielke and Davey (2005) who described a number of non-compliant sites in eastern Colorado. In NOAAÕs official response to this criticism, Vose et al (2005) said in effect -

 

it doesnÕt matter. ItÕs only eastern Colorado. You havenÕt proved that there are problems anywhere else in the United States.

 

In most businesses, the identification of glaring problems, even in a restricted region like eastern Colorado, would prompt an immediate evaluation to ensure that problems did not actually exist. However, that does not appear to have taken place and matters rested until Anthony Watts and the volunteers at surfacestations.org launched a concerted effort to evaluate stations in other parts of the country and determined that the problems were not only just as bad as eastern Colorado, but in some cases were much worse.

 

Now in response to problems with both station quality and adjustment software, Schmidt and Hansen say in effect, as NOAA did before them -

 

it doesnÕt matter. ItÕs only the United States. You havenÕt proved that there are problems anywhere else in the world.