L'Ombre de l'Olivier

The Shadow of the Olive Tree

being the maunderings of an Englishman on the Côte d'Azur

27 February 2008 Blog Home : February 2008 : Permalink

Lies, Damned Lies, Statistics and Climate Change Statistics

That famous dictum about lies, attributed to either Mark Twain or Benjamin Disraeli depending on who you trust, probably needs to be extended. The more I read, the more I feel that the statistics underpinning Climate Change/Global Warming etc. look remarkably ropy and the models that are constructed with their help are, if anything, worse.

We should I think skip lightly over the "hockey stick" as that particular bit of statistical analysis now looks about as convincing as Al Gore professing an abiding love of Florida election technology. However that is far from the only example of science where the conclusion seems to have helped select the data used or the manipulations required. For example this post by tamino explains:

In the same time period (1958-1988) during which Puerto Maldon raw data show more than 2 deg.C cooling, its neighbors show about 1 deg.C warming. To make the Puerto Maldon trend match its rural neighbors (which will include a lot more than I�ve shown here), it needs an adjustment amounting to about 3 deg.C during that time period.

Which I might translate as - this station gives the wrong answer for what we want unless we change it, so since we know the answer is right we change it to work. Better known as working back from the answer and when inky schoolboys do it we call it cheating. I do understand the rationale behind the adjustment, its just that if you are doing that adjustment you are creating a dependant variable from an independant one and thereby breaking some fairly basic statistics/probability rules and thereby making the whole calculation extraordiarily iffy - if not flat out wrong.

Anthony Watts and his team of volunteers at Surface Stations are busily demonstrating that the raw data used by some of the climate change folks is, well to put it politely, not exactly high quality. Since this data is used to create all sorts of alarmist models the raw data quality would seem to be critical. As we all know from computer science "Garbage In, Garbage Out" is a basic rule so if the raw data is crud then the results are going to be basically meaningless. This, one would have thought, should be well understood by climate scientists since it is well known that the earth's climate is one of those complicated fractal things were minute changes of initial conditions lead to enormous changes in conditions a few years later.

Apparently this is not the case. Over at Climate Audit, Steve McIntyre has looked at some of the data used (e.g. in Peru) and found very definite oddities that would seem to spring from poor initial data or sloppy use thereof.

The fact that climate change scientists can then say that both warming and cooling observations fit with the model without going into detail does kind of make you wonder. It is possible that the models predict warming and cooling as CO2 increases (or whatever the current scare du jour is) but you do have to wonder. Recall claims about the hurricanes? Well the evidence that hurricanes are increasing is at best mixed and at worst disproven. The world's weather is currently severely affected by a monster "la niña" which means that January was saw a major drop in global temperatures. The analysis of this drop is interesting.

Even more interesting is the comparison of four series of global temperature changes that indicates the quality of climate change statistics. Anthony Watts and/or another studious amateur created a combined data series listign global temperature variation from "normal" per month for each of the four major data sets from Jan 1979 to Jan 2008. Mr Watts then created 4 histograms from those sets and looks at them. Personally I find it more interesting to just overlay them, as I have done below (click to enlarge and show table of data):
Overlain histograms for global temperature
The green and yellow lines are the two satellite data sets. Since they use the same base satellite sensor it is not surprising that they agree in large part with each other, athough there are two clear differences with RSS (green) having nearly 20 more months in the range 0.3 to 0.4 compared to UAH and a similar number less in the range -0.2 to -0.1. However when you look at statistical measures such as the mean and standard deviation (as well as the max and min), you see very close agreement (mean 0.7 and 0.8, std dev 0.21 and 0.23).

On the other hand the ground data sets clearly differ not only from each other but also from the satellite sets. The red (HadCRUT) more closely corresponds with the satellite data than the blue (GISS) although it is clearly on average warmer than the two satelliet sets. Interestingly, and I'm not sure if this says anything sensible, the standard deviations are similar to but slightly smaller than the satellite ones (0.19 for HadCRUT, 0.20 for GISS).

Of rather greater interest is that, just as RSS & UAH share the same base satellite data, HadCRUT and GISS have considerable overlap in surface stations. It's a bit of a worry when one is consistently warmer than the other and reminds me of the advice: "Never go to sea with 2 chronometers - take one or three". We could use a third opinion for the ground data I think and ideally one that doesn't "use statistics the way a drunk uses a lamp post - for support rather than illumination".