Does HansenÕs Error ÒMatterÓ?
ClimateAudit.org, August 11, 2007
http://www.climateaudit.org/?p=1885
By Steve McIntyre
ThereÕs been quite a bit of publicity about HansenÕs Y2K error and
the change in the U.S. leaderboard (by which 1934 is the new warmest U.S. year)
in the right-wing blogosphere. In contrast, realclimate
has dismissed it a triviality and the climate blogosphere is doing its best to
ignore the matter entirely.
My own view has been that matter is certainly not the triviality
that Gavin Schmidt would have you believe, but neither is it any magic bullet.
I think that the point is significant for reasons that have mostly eluded
commentators on both sides.
Station Data
First, letÕs start with the impact of HansenÕs error on individual
station histories (and my examination of this matter arose from examination of
individual station histories and not because of the global record.) GISS
provides an excellent and popular online service for
plotting temperature histories of individual stations. Many such histories have
been posted up in connection with the ongoing examination of surface station
quality at surfacestations.org. HereÕs an example of this type of graphic:

Figure 1. Plot of Detroit Lakes MN using GISS software
But itÕs presumably not just Anthony Watts and surfacestations.org
readers that have used these GISS station plots; presumably scientists and
other members of the public have used this GISS information. The Hansen error
is far from trivial at the level of individual stations. Grand Canyon was one
of the stations previously discussed at climateaudit.org in connection with
Tucson urban heat island. In this case, the Hansen error was about 0.5 deg C.
Some discrepancies are 1 deg C or higher.

Figure 2. Grand Canyon Adjustments
Not all station errors lead to positive steps. There is a bimodal
distribution of errors reported earlier at CA here, with many stations
having negative steps. There is a positive skew so that the impact of the step
error is about 0.15 deg C according to Hansen. However, as you can see from the
distribution, the impact on the majority of stations is substantially higher
than 0.15 deg. For users of information regarding individual stations, the
changes may be highly relevant.
GISS recognized that the error had a significant impact on
individual stations and took rapid steps to revise their station data (and
indeed the form of their revision seems far from ideal indicating the haste of
their revision.) GISS failed to provide any explicit notice or warning on their
station data webpage that the data had been changed, or an explicit notice to
users who had downloaded data or graphs in the past that there had been
significant changes to many U.S. series. This obligation existed regardless of
any impact on world totals.

Figure 3. Distribution of Step Errors
GISS has emphasized recently that the U.S. constitutes only 2% of
global land surface, arguing that the impact of the error is negligible on the
global averagel. While this may be so for users of the GISS global average,
U.S. HCN stations constitute about 50% of active (with values in 2004 or later)
stations in the GISS network (as shown below). The sharp downward step in
station counts after March 2006 in the right panel shows the last month in
which USHCN data is presently included in the GISS system. The Hansen error
affects all the USHCN stations and, to the extent that users of the GISS system
are interested in individual stations, the number of affected stations is far
from insignificant, regardless of the impact on global averages.

Figure 4. Number of Time Series in GISS Network. This includes all
versions in the GISS network and exaggerates the population in the 1980s as
several different (and usually similar) versions of the same data are often included.
U.S. Temperature History
The Hansen error also has a significant impact on the GISS
estimate of U.S. temperature history with estimates for 2000 and later being
lowered by about 0.15 deg C (2006 by 0.10 deg C). Again GISS moved quickly to
revise their online information changing their US temperature data
on Aug 7, 2007. Even though Gavin Schmidt of GISS and realclimate said
that changes of 0.1 deg C in individual years were ÒsignificantÓ, GISS did
not explicitly announce these changes or alert readers that a ÒsignificantÓ
change had occurred for values from 2000-2006. Obviously they would have been
entitled to observe that the changes in the U.S. record did not have a material
impact on the world record, but it would have been appropriate for them to have
provided explicit notice of the changes to the U.S. record given that the
changes resulted from an error.
The changes in the U.S. history were not brought to the attention
of readers by GISS itself, but in this
post at climateaudit. As a result of the GISS revisions, there was a change
in the Òleader boardÓ and 1934 emerged as the warmest U.S. year and more warm
years were in the top ten from the 1930s than from the past 10 years. This has
been widely discussed in the right-wing blogosphere and has been acknowledged
at realclimate
as follows:
The net effect of the change was to reduce
mean US anomalies by about 0.15 ¼C for the years 2000-2006. There were some
very minor knock on effects in earlier years due to the GISTEMP adjustments for
rural vs. urban trends. In the global or hemispheric mean, the differences were
imperceptible (since the US is only a small fraction of the global area).
There were however some very minor
re-arrangements in the various rankings (see data). Specifically, where 1998
(1.24 ¼C anomaly compared to 1951-1980) had previously just beaten out 1934
(1.23 ¼C) for the top US year, it now just misses: 1934 1.25¼C vs. 1998 1.23¼C. None of these differences are statistically
significant.
In my opinion, it would have been more appropriate for Gavin
Schmidt of GISS (who was copied on the GISS correspondence to me) to ensure
that a statement like this was on the caption to the U.S. temperature history
on the GISS webpage, rather than after the fact at realclimate.
Obviously much of the blogosphere delight in the leader board
changes is a reaction to many fevered press releases and news stories about
year x being the Òwarmest yearÓ. For example, on Jan 7, 2007, NOAA announced that:
The 2006 average annual temperature for the
contiguous U.S. was the warmest on record.
This press release was widely covered as you can determine by
googling Òwarmest year 2006 united statesÓ. Now NOAA and NASA are different
organizations and NOAA, not NASA, made the above press release, but members of
the public can surely be forgiven for not making fine distinctions between
different alphabet soups. I think that NASA might reasonably have foreseen that
the change in rankings would catch the interest of the public and, had they
made a proper report on their webpage, they might have forestalled much
subsequent criticism.
In addition, while Schmidt describes the changes atop the leader
board as Òvery minor re-arrangementsÓ, many followers of the climate debate are
aware of intense battles over 0.1 or 0.2 degree (consider the satellite
battles.) Readers might perform a little thought experiment: suppose that
Spencer and Christy had published a temperature history in which they claimed
that 1934 was the warmest U.S. year on record and then it turned out that they
had been a computer programming error opposite to the one that Hansen made,
that Wentz and Mears discovered there was an error of 0.15 deg C in the Spencer
and Christy results and, after fiixing this error, it turned out that 2006 was
the warmest year on record. Would realclimate simply describe this as a Òvery
minor re-arrangementÓ?
So while the Hansen error did not have a material impact on world
temperatures, it did have a very substantial impact on U.S. station data and a
ÒsignificantÓ impact on the U.S. average. Both of these surely ÒmatterÓ and
both deserved formal notice from Hansen and GISS.
Can GISS Adjustments ÒFixÓ Bad Data?
Now my original interest in GISS adjustments did not arise
abstractly, but in the context of surface station quality. Climatological
stations are supposed to meet a variety of quality standards, including the
relatively undemanding requirement of being 100 feet (30 meters) from paved
surfaces. Anthony Watts and volunteers of surfacestations.org have documented
one defective site after another, including a weather station in a parking lot
at the University of Arizona where MBH coauthor Malcolm Hughes is employed,
shown below.

Figure 5. Tucson University of Arizona Weather Station
These revelations resulted in a variety of aggressive
counter-attacks in the climate blogosphere, many of which argued that, while
these individual sites may be contaminated, the ÒexpertÓ software at GISS and
NOAA could fix these problems, as, for example here.
they [NOAA and/or GISS] can ÒfixÓ the problem
with math and adjustments to the temperature record.
This assumes that contaminating influences
canÕt be and arenÕt being removed analytically.. I havenÕt seen anyone saying
such influences shouldnÕt be removed from the analysis. However I do see
professionals saying ÒweÕve done itÓ
ÒFixingÓ bad data with software is by no means an easy thing to do
(as witness MannÕs unreported modification of principal components methodology
on tree ring networks.) The GISS adjustment schemes (despite protestations from
Schmidt that they are Òclearly outlinedÓ) are not at all easy to replicate
using the existing opaque descriptions. For example, there is nothing in the
methodological description that hints at the change in data provenance before
and after 2000 that caused the Hansen error. Because many sites are affected by
climate change, a general urban heat island effect and local microsite changes,
adjustment for heat island effects and local microsite changes raises some
complicated statistical questions, that are nowhere discussed in the underlying
references (Hansen et al 1999, 2001). In particular, the adjustment methods are
not techniques that can be looked up in statistical literature, where their
properties and biases might be discerned. They are rather ad hoc and local
techniques that may or may not be equal to the task of ÒfixingÓ the bad data.
Making readers run the gauntlet of trying to guess the precise
data sets and precise methodologies obviously makes it very difficult to
achieve any assessment of the statistical properties. In order to test the GISS
adjustments, I requested that GISS provide me with details on their adjustment
code. They refused. Nevertheless, there are enough different versions of U.S.
station data (USHCN raw, USHCN time-of-observation adjusted, USHCN adjusted,
GHCN raw, GHCN adjusted) that one can compare GISS raw and GISS adjusted data
to other versions to get some idea of what they did.
In the course of reviewing quality problems at various surface
sites, among other things, I compared these different versions of station data,
including a comparison of the Tucson weather station shown above to the Grand
Canyon weather station, which is presumably less affected by urban problems.
This comparison demonstrated a very odd pattern discussed here. The adjustments show that
the trend in the problematic Tucson site was reduced in the course of the
adjustments, but they also showed that the Grand Canyon data was also adjusted,
so that, instead of the 1930s being warmer than the present as in the raw data,
the 2000s were warmer than the 1930s, with a sharp increase in the 2000s.


Figure 6. Comparison of Tucson and Grand Canyon Versions
Now some portion of the post-2000 jump in adjusted Grand Canyon
values shown here is due to HansenÕs Y2K error, but it only accounts for a 0.5
deg C jump after 2000 and does not explain why Grand Canyon values should have
been adjusted so much. In this case, the adjustments are primarily at the USHCN
stage. The USHCN station history adjustments appear particularly troublesome to
me, not just here but at other sites (e.g. Orland CA). They end up making
material changes to sites identified as ÒgoodÓ sites and my impression is that
the USHCN adjustment procedures may be adjusting some of the very ÒbestÓ sites
(in terms of appearance and reported history) to better fit histories from
sites that are clearly non-compliant with WMO standards (e.g. Marysville,
Tucson). There are some real and interesting statistical issues with the USHCN
station history adjustment procedure and it is ridiculous that the source code
for these adjustments (and the subsequent GISS adjustments - see bottom panel)
is not available/
Closing the circle: my original interest in GISS adjustment
procedures was not an abstract interest, but a specific interest in whether
GISS adjustment procedures were equal to the challenge of ÒfixingÓ bad data. If
one views the above assessment as a type of limited software audit (limited by
lack of access to source code and operating manuals), one can say firmly that
the GISS software had not only failed to pick up and correct fictitious steps
of up to 1 deg C, but that GISS actually introduced this error in the course of
their programming.
According to any reasonable audit standards, one would conclude
that the GISS software had failed this particular test. While GISS can (and
has) patched the particular error that I reported to them, their patching
hardly proves the merit of the GISS (and USHCN) adjustment procedures. These
need to be carefully examined. This was a crying need prior to the
identification of the Hansen error and would have been a crying need even
without the Hansen error.
One practical effect of the error is that it surely becomes much
harder for GISS to continue the obstruction of detailed examination of their
source code and methodologies after the embarrassment of this particular
incident. GISS itself has no policy against placing source code online and,
indeed, a huge amount of code for their climate model is online. So itÕs hard
to understand their present stubbornness.
The U.S. and the Rest of the World
Schmidt observed that the U.S. accounts for only 2% of the worldÕs
land surface and that the correction of this error in the U.S. has Òminimal
impact on the world dataÓ, which he illustrated by comparing the U.S. index to
the global index. IÕve re-plotted this from original data on a common scale.
Even without the recent changes, the U.S. history contrasts with the global
history: the U.S. history has a rather minimal trend if any since the 1930s,
while the ROW has a very pronounced trend since the 1930s.

Re-plotted from GISS Fig A and GFig D data.
These differences are attributed to ÒregionalÓ differences and it
is quite possible that this is a complete explanation. However, this conclusion
is complicated by a number of important methodological differences between the
U.S. and the ROW. In the U.S., despite the criticisms being rendered at
surfacestations.org, there are many rural stations that have been in existence
over a relatively long period of time; while one may cavil at how NOAA and/or
GISS have carried out adjustments, they have collected metadata for many
stations and made a concerted effort to adjust for such metadata. On the other
hand, many of the stations in China, Indonesia, Brazil and elsewhere are in
urban areas (such as Shanghai or Beijing). In some of the major indexes
(CRU,NOAA), there appears to be no attempt whatever to adjust for urbanization.
GISS does report an effort to adjust for urbanization in some cases, but their
ability to do so depends on the existence of nearby rural stations, which are
not always available. Thus, ithere is a real concern that the need for urban
adjustment is most severe in the very areas where adjustments are either not
made or not accurately made.
In its consideration of possible urbanization and/or microsite
effects, IPCC has taken the position that urban effects are negligible, relying
on a very few studies (Jones et al 1990, Peterson et al 2003, Parker 2005,
2006), each of which has been discussed at length at this site. In my opinion,
none of these studies can be relied on for concluding that urbanization impacts
have been avoided in the ROW sites contributing to the overall history.
One more story to conclude. Non-compliant surface stations were
reported in the formal academic literature by Pielke and Davey (2005) who
described a number of non-compliant sites in eastern Colorado. In NOAAÕs
official response to this criticism, Vose et al (2005) said in effect -
it doesnÕt matter. ItÕs only eastern Colorado.
You havenÕt proved that there are problems anywhere else in the United States.
In most businesses, the identification of glaring problems, even
in a restricted region like eastern Colorado, would prompt an immediate
evaluation to ensure that problems did not actually exist. However, that does
not appear to have taken place and matters rested until Anthony Watts and the
volunteers at surfacestations.org launched a concerted effort to evaluate
stations in other parts of the country and determined that the problems were
not only just as bad as eastern Colorado, but in some cases were much worse.
Now in response to problems with both station quality and
adjustment software, Schmidt and Hansen say in effect, as NOAA did before them
-
it doesnÕt matter. ItÕs only the United
States. You havenÕt proved that there are problems anywhere else in the world.