The HARRY_READ_ME.txt file

Part 28

28. With huge reluctance, I have dived into 'anomdtb' - and already I have
that familiar Twilight Zone sensation.

I have found that the WMO Code gets set to -999 if *both* lon and lat are
missing. However, the following points are relevant:

* LoadCTS multiplies non-missing lons by 0.1, so they range from -18 to +18
with missing value codes passing through AS LONG AS THEY ARE -9999. If they
are -999 they will be processed and become -99.9. It is not clear why lats
are not treated in the same way!

* The subroutine 'Anomalise' in anomdtb checks lon and lat against a simple
'MissVal', which is defined as -999. This will catch lats of -999 but not
lons of -9999.

* This does still not explain how we get so many -999 codes.. unless we don't
and it's just one or two?

And the real baffler:

* If the code is -999 because lat and lon are both missing - how the bloody
hell does it know there's a duplication within 8km?!!!

.. ah, OK. well for a start, the last point above does not apply - not one
case of the code being set to -999 because of lat/lon missing. In fact, I
hate to admit it, bit it is *sort of* clever - the code is set to -999 to
prevent it being used again, because the distance/duplication checker will
not make a distance comparison if either code is -999. So HOW COME loads of
the duplicates have a code of -999?!!!

The plot thickens.. I changed the exclusion tests in the duplication loops
from:
if (AStn(XAStn).NE.MissVal) then
to:
if (int(AStn(XAStn)).NE.-999) then

This made NO DIFFERENCE. So having tested to ensure that the first of the
pair hasn't already been used - we then use it! What's more I've noticed
that it's usually the one 'incorporated' in the previous iteration!

Consider:

67700 vs. 160660: 4.6 -0.9 4.6 -0.9 -> 5.4km
-999 vs. 160707: 4.6 -0.9 4.6 -0.9 -> 2.2km
-999 vs. 160800: 4.6 -0.9 4.5 -0.9 -> 7.3km
-999 vs. 160811: 4.6 -0.9 4.6 -0.9 -> 5.8km

Here we can see (check the first set of lat/lons) that, after being
incorporated into 160660, 67700 goes on to also be incorporated into
160707, 160800 and 160811! So the same data could end up in three
other stations. It gets worse!! Because later on, we find:

160660 vs. 160707: 4.6 -0.9 4.6 -0.9 -> 7.9km
-999 vs. 160800: 4.6 -0.9 4.5 -0.9 -> 7.0km
-999 vs. 160811: 4.6 -0.9 4.6 -0.9 -> 5.8km
160707 vs. 160800: 4.6 -0.9 4.5 -0.9 -> 7.9km
-999 vs. 160811: 4.6 -0.9 4.6 -0.9 -> 6.6km
160800 vs. 160811: 4.5 -0.9 4.6 -0.9 -> 2.2km

So three of those recipients have gone on to be incorporated into one
of them (160811). But although in this case 67700 is within 8km of
160811, there is no guarantee! Indeed, with this system, the 'chosen'
station may hop all over the place in <8km steps, collecting data as
it goes. In a densely-packed area this could drastically reduce the
number of stations. Then there's these:

85997 vs. 390000: -10.0 -20.0 -10.0 -20.0 -> 0.0km
-999 vs. 685807: -10.0 -20.0 -10.0 -20.0 -> 0.0km
-999 vs. 688607: -10.0 -20.0 -10.0 -20.0 -> 0.0km
-999 vs. 967811: -10.0 -20.0 -10.0 -20.0 -> 0.0km
-999 vs. 968531: -10.0 -20.0 -10.0 -20.0 -> 0.0km

as might be guessed, they all end up incorporated into 968531 - but
no surprise seeing as their lats & lons are rubbish!!! Oh Tim what
have you done, man? [actually - what he's done is to let missing
lats & lons through. Missing lon code is -1999 not -9999 so these
figures are the roundings]

All that said, the biggest worry is still the lats & lons themselves.
They just don't look realistic. Lats appear to have been reduced by
a factor of 10 too, even though I can't find the code for that. And
(from the top example) is 67700 really 5.4km from 160660?

67700 460 -90 273 LUGANO SWITZERLAND 1864 2006 101864 -999.00
160660 456 -87 -999 MILANO MALPENSA ITALY 1961 1970 101961 -999.00

Of course not! It's just over 50km. I do not understand why the lats
& lons have been scaled, when the stated distance threshold has not.

At least I've found *where* they are scaled, in LoadCTS (crutsfiles.f90):

if (StnInfo(XStn,2).NE.LatMissVal) Lat (XStn) = real(StnInfo(XStn,2)) / real(LatFactor)
if (StnInfo(XStn,3).NE.LonMissVal) Lon (XStn) = real(StnInfo(XStn,3)) / real(LonFactor)

Looking at how LoadCTS is called from anomdtb..

subroutine LoadCTS (StnInfo,StnLocal,StnName,StnCty,Code,Lat,Lon,Elv,OldCode,Data,YearAD,&
NmlData,DtbNormals,CallFile,Hulme,Legacy,HeadOnly,HeadForm,LongType,Silent,Extra,PhilJ, &
YearADMin,YearADMax,Source,SrcCode,SrcSuffix,SrcDate, &
LatMV,LonMV,ElvMV,DataMV,LatF,LonF,ElvF,NmlYr0,NmlYr1,NmlSrc,NmlInc)

call LoadCTS (StnInfoA,StnLocalA,StnNameA,StnCtyA,Code=AStn,OldCode=AStnOld, &
Lat=ALat,Lon=ALon,Elv=AElv,DtbNormals=DtbNormalsA, &
Data=DataA,YearAD=AYearAD,CallFile=LoadFileA,silent=1) ! get .dtb file

.. we see that Legacy is not passed. This means that.. (from LoadCTS):

LatFactor=100 ; LonFactor=100 ; ElvFactor=1 ! usual/hulme hdr factors
if (present(Legacy)) then
LatFactor=10 ; LonFactor=10 ; ElvFactor=1 ! legacy hdr factors
end if
if (present(LatF)) LatFactor = LatF ! custom hdr factors
if (present(LonF)) LonFactor = LonF
if (present(ElvF)) ElvFactor = ElvF

..LatFactor and LonFactor are set to 100.

So I added a specific pair of arguments, LatF=10,LonF=10, and got:

> NORMALS MEAN percent STDEV percent
> .dtb 3323823 81.3
made it to here
> .cts 91963 2.2 3415786 83.5
> PROCESS DECISION percent %of-chk
> no lat/lon 0 0.0 0.0
> no normal 675037 16.5 16.5
> out-of-range 744 0.0 0.0
> duplicated 53553 1.3 1.6
> accepted 3361489 82.2
> Dumping years 1901-2006 to .txt files...

Hurrah! Looking at the log it is still ignoring the -999 Code and re-intgrating stations..
but not to any extent worth worrying about. Not when duplications are down to 1.3% :-)))

Then got a mail from PJ to say we shouldn't be excluding stations inside 8km anyway - yet
that's in IJC - Mitchell & Jones 2005! So there you go. Ran again with 0km as the distance:

> NORMALS MEAN percent STDEV percent
> .dtb 3323823 81.3
made it to here
> .cts 91963 2.2 3415786 83.5
> PROCESS DECISION percent %of-chk
> no lat/lon 0 0.0 0.0
> no normal 675037 16.5 16.5
> out-of-range 744 0.0 0.0
> accepted 3415042 83.5
> Dumping years 1901-2006 to .txt files...

Which hasn't saved much as it turns out. In fact, I must conclude that an inquiring mind is
a very dangerous thing - I decided to see what difference it made, turning off the proximity
duplicate detection and elimination:

crua6[/cru/cruts/version_3_0/primaries/temp] wc -l */*1962.12.txt
2773 oldtxt/old.1962.12.txt
3269 tmptxt0km/tmp.1962.12.txt
3308 tmptxt8km/tmp.1962.12.txt

So.. 'oldtxt' is before I fixed the lat/lon scaling problem. But look at the last two - I
got MORE results when I used an elimination radius! Whaaaaaaaaat?!!!

/goes home in a huff

/gets out of huff and goes into house, checks things and thinks hard

Okay, I guess if we don't do the roll-duplicates-together thing, then we could lose data
because the 'rolled' station (ie the one subsumed into its neighbour) might have useful
years but no normals, so that data would be lost?


Go on to part 29, back to index or Email search