FOIA HARRY_READ

27. Well, enough excuses - time to remember how to do the anomalising and
gridding things! Fisrtly, ran 'addnormline' just to ensure all normals are
up to date. The result was 8 new sets of normals, so well worth doing. The
database is now:

tmp.0704292158.dtb

Ran 'anomdtb' - got caught out by the requirement for a companion '.dts'
file again, ran 'falsedts.for' and carried on.. would still be nice to be
sure that it's not something meaningful **sigh**.

Output:

crua6[/cru/cruts/version_3_0/primaries/temp] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.tmp
> Select the .cts or .dtb file to load:
tmp.0704292158.dtb
tmp.0704292158.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb

> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal: 23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the generic .txt file to save (yy.mm=auto):
tmp.txt
> Select the first,last years AD to save:
1901,2006
> Operating...
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dts

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dts

> Failed to find file.
> Enter the file, with suffix: .dts
tmp.0704292158.dts
tmp.0704292158.dts

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dts

> NORMALS MEAN percent STDEV percent
> .dtb 3330007 81.3
made it to here
> .cts 92803 2.3 3422810 83.6
> PROCESS DECISION percent %of-chk
> no lat/lon 0 0.0 0.0
> no normal 671592 16.4 16.4
> out-of-range 744 0.0 0.0
> duplicated 4102723 100.2 119.9
> accepted -680657 -16.6
> Dumping years 1901-2006 to .txt files...

crua6[/cru/cruts/version_3_0/primaries/temp]

.. which is a trifle worrying! And looking at the .txt files, they look
rather odd as well - for instance, tmp.1953.03.txt starts like this:

7.09 0.87 10.0 0.10000 10010
7.83 -1.55 28.0 -4.80000 10080
6.97 -1.89 10.0 0.90000 -999
6.97 -1.89 100.0 0.50000 10260
7.45 -1.90 16.0 -3.10000 10280
6.95 -2.55 129.0 3.70000 10650
7.04 -3.11 14.0 0.00000 10980
6.60 -0.20 0.0 1.20000 11000
6.73 -1.44 13.0 1.60000 -999
6.68 -1.40 39.0 2.20000 11530

Now, do those first two columns look like lat & lon to you? Me neither,
here's what the old version of the same file looks like:

60.00 -20.00 -999.0 0.40000-990007
62.00 -33.00 -999.0 -0.40000-990002
56.50 -51.00 0.0 -0.50000-990000
6.90 122.06 6.0 -0.60000 -999
13.13 123.73 17.0 0.20000 -999
14.52 121.00 15.0 0.60000 -999
18.37 121.63 4.0 1.10000 -999
6.90 122.00 6.0 -0.60000 -999
10.70 122.50 14.0 -0.10000 -999
13.13 123.73 19.0 0.10000 -999

In fact, the first two columns never get outside of +/- 30. Oh bugger.
What the HELL is going on?!

Decided to pursue that worrying (and impossible) 'duplicates' figure.

The function 'sort' was used to sort the database so that any duplicate
lines would be together - then 'uniq' was used to pull out duplicates.
There were quite a few dupes, and one or two triples too, like these:

crua6[/cru/cruts/version_3_0/primaries/temp] grep -n '1984 \-83 \-46 22 55 126 154 222 215 159 63 32 \-62' tmp.0704292158.dtb
195789:1984 -83 -46 22 55 126 154 222 215 159 63 32 -62
254265:1984 -83 -46 22 55 126 154 222 215 159 63 32 -62
254380:1984 -83 -46 22 55 126 154 222 215 159 63 32 -62

These are from the following stations:
720344 408 1158 1539 ELKO-FAA-AP---------USA--------- 1870 1996 301870 -999.00
725837 408 1158 1549 NV ELKO FAA AP 1930 1990 101930 -999.00
725910 401 1223 103 RED BLUFF USA 1878 2006 101878 -999.00

The past two are consecutive stations.

Looking at the last two.. it seems that 725910 has 725837's data!

1977 71 124 118 184 167 275 283 280 230 190 126 99
1978 107 114 149 144 208 248 289 282 232 220 118 72
1979 85 99 139 150 218 256 282 258 253 189 117 94
1980 99 121 119 156 192 216 275 262 241 196 128 102
1981 14 19 49 90 123 196 233 227 164 71 47 11
1982 -49 -14 32 57 114 164 206 214 148 74 11 -23
1983 -9 -1 54 59 114 167 204 223 170 104 25 -19
1984 -83 -46 22 55 126 154 222 215 159 63 32 -62

Ascan be seen, 1981 sees a complete chance in range, especially for
Autumn/Winter. In fact, from 1981 to 1990, 725910 is a copy of
725837! It then reverts to the original range for the rest of the run.
So.. did the merging program do this? Unfortunately, yes. Check dates:

crua6[/cru/cruts/version_3_0/db/testmergedb] grep -n 'RED BLUFF' tmp.0*.*
tmp.0612081519.dat:28595: 725910 401 1223 103 RED BLUFF USA 1991 2006 101991 -999.00
tmp.0702091122.dtb:171674: 725910 401 1223 103 RED BLUFF USA 1878 1980 101878 -999.00
tmp.0704251819.dtb:200331: 725910 401 1223 103 RED BLUFF USA 1878 2006 101878 -999.00
tmp.0704271015.dtb:254272: 725910 401 1223 103 RED BLUFF USA 1878 2006 101878 -999.00
tmp.0704292158.dtb:254272: 725910 401 1223 103 RED BLUFF USA 1878 2006 101878 -999.00
crua6[/cru/cruts/version_3_0/db/testmergedb]

The first file is the 1991-2006 update file. The second is the original
temperature database - note that the station ends in 1980.

It has *inherited* data from the previous station, where it had -9999
before! I thought I'd fixed that?!!!

/goes off muttering to fix mergedb.for for the five hundredth time

Miraculously, despite being dog-tired at nearly midnight on a Sunday, I
did find the problem. I was clearing the data array but not close enough
to the action - when stations were being passed through (ie no data to
add to them) they were not being cleaned off the array afterwards. Meh.

Wrote a specific routine to clear halves of the data array, and back to
square one. Re-ran the ACT file to merge the x-1990 and 1991-2006 files.
Created an output file exactly the same size as the last time (phew!)
but with..

crua6[/cru/cruts/version_3_0/db/testmergedb] comm -12 tmp.0704292355.dtb tmp.0704251819.dtb |wc -l
285516
crua6[/cru/cruts/version_3_0/db/testmergedb] wc -l tmp.0704292355.dtb
285829 tmp.0704292355.dtb

.. 313 lines different. Typically:

14881,14886c14881,14886
< 1965-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1966-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1967-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1968-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1969-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1970-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
---
> 1965 -221 -177 -234 -182 -5 6 24 36 -15 -91 -100 -221
> 1966 -272 -194 -248 -192 -66 10 27 45 -12 -75 -139 -228
> 1967 -201 -243 -196 -158 -26 1 40 30 -18 -89 -183 -172
> 1968 -253 -256 -253 -107 -42 10 46 33 -21 -64 -134 -195
> 1969 -177 -202 -248 -165 -33 8 42 50 -1 -89 -157 -204
> 1970 -237 -192 -217 -160 -87 6 30 25 -5 -55 -143 -222

ie, what should have been missing data is now missing data again:

200436,200445c200436,200445
< 1981-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1982-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1983-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1984-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1985-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1986-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1987-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1988-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1990-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
---
> 1981 14 19 49 90 123 196 233 227 164 71 47 11
> 1982 -49 -14 32 57 114 164 206 214 148 74 11 -23
> 1983 -9 -1 54 59 114 167 204 223 170 104 25 -19
> 1984 -83 -46 22 55 126 154 222 215 159 63 32 -62
> 1985 -57 -29 17 89 122 181 244 188 121 79 -11 -50
> 1986 2 31 66 72 113 187 194 214 116 78 11 -39
> 1987 -59 -5 30 97 131 177 193 192 153 101 21 -35
> 1988 -65 -15 29 80 108 184 222 198 138 116 8 -57
> 1989 -113 -54 53 94 113 164 215 186 143 78 8 -24
> 1990 -24 -30 49 100 100 166 214 194 177 77 9 -97

Hurrah!

So the interim database file is tmp.0704292355.dtb. Now to re-add
the US station dataset with simpleaddnew.for.

crua6[/cru/cruts/version_3_0/db/testmergedb] ./simpleaddnew

SIMPLYADDNEW - add stations to a database
This program assumes the two databases have
NO COMMON STATIONS and will fail (stop) if
any are found.

Please enter the main database: tmp.0704292355.dtb

Please enter the new database: tmp.0704251654.dat
Please enter a 3-character parameter code: tmp
Output database is: tmp.0704300053.dtb
crua6[/cru/cruts/version_3_0/db/testmergedb]

So now we have the combined database again, a bit quicker than
last time: tmp.0704300053.dtb. Pity we slid into May: I was hoping
to only be FIVE MONTHS late.

What's worse - there are STILL duplicate non-missing lines, 210 of
them. The first example is this:

1835 92 73 141 187 260 279 281 288 241 195 183 106

Which belongs to this in the original database (tmp.0702091122.dtb):

722080 329 800 15 CHARLESTON, S. CAROL UNITED STATES 1823 1990 101823 -999.00
6190 84 100 142 180 224 257 274 270 245 191 145 104

..and to this in the US database (tmp.0704251654.dat):

720467 328 799 3 CHARLESTON-CITY-----USA--------- 1835 1996 301835 -999.00
6190 91 106 144 186 227 260 277 272 249 199 154 112

These two stations obviously have a lot in common - though not
everything, as their normals (shown) differ. In fact, on examination
the US database record is a poor copy of the main database one, it
has more missing data and so forth. By 1870 they have diverged, so
in this case it's probably OK.. but what about the others? I just do
not have the time to follow up everything. We'll have to take 210
year repetitions as 'one of those things'.

..actually, I decided in the end to follow up all 210 of them. The
likelihood is that the number is far greater, since the filtering
that gave the 210 figure excluded any lines with two or more
consecutive missing values (to avoid hundreds of just-missing-value
lines). Also I spotted some instances where data lines would be
identical but for one or more missing values in one of the stations.

After checking, I found that the majority of the duplications were
between the original database and the US database, with just a couple
of 'linked' stations within the original database, and half a dozen
in the 1991-2006 update file. One surprise was that stations I'm sure
I rejected ended up marked as 'addnew' in the .act file - quite
unsettling!

Rather foolishly, perhaps, I decided to have a go at interactively
incorporating the US data rather than using 'simplyaddnew'. However,
progress was so slow (because of the high number of 'near matches')
that this approach was abandoned.

Tried 'anomdtb' with the fixed final file (tmp.0704300053.dtb)...
no better! The crucial bits:

> NORMALS MEAN percent STDEV percent
> .dtb 3323823 81.3
made it to here
> .cts 91963 2.2 3415786 83.5
> PROCESS DECISION percent %of-chk
> no lat/lon 0 0.0 0.0
> no normal 675037 16.5 16.5
> out-of-range 744 0.0 0.0
> duplicated 4100117 100.2 120.1
> accepted -685075 -16.7
> Dumping years 1901-2006 to .txt files...
> Failed to create file. Try again.
> Enter the file, with suffix: .ann
tmp.ann
> Failed to create file. Try again.
> Enter the file, with suffix: .ann
h.ann

crua6[/cru/cruts/version_3_0/primaries/temp]

So the 'duplicated' figure is slightly lower.. but what's this
error with the '.ann' file?! Never seen before. Oh GOD if I
could start this project again and actually argue the case for
junking the inherited program suite!!

OK.. the .ann file was simply that it refuses to overwrite any
existing one. Meh. It's happy to overwrite the log file of
course - nice bit of logic there.

and the duplicates? Well I inserted a debug line where the
decision is made. Here's an example:

712600 vs. 727340: 4.7 8.4 4.7 8.4 -> 0.0km

Here the two WMO codes look OK (though others are -999 which
seems unlikely) but the two lat/lon pairs? Ooops. Here are the
actual headers:

712600 465 845 187 Sault Ste Marie A CANADA 1945 2006 361945 -999.00
727340 465 844 220 SAULT-STE-MARIE----- USA--------- 1888 2006 101888 -999.00

So, uhhhh.. what in tarnation is going on? Just how off-beam
are these datasets?!!

Not sure why the lats & lons are a factor of 10 too low - may
be intentional though it wasn't happening before.

Ran with the original database:

> NORMALS MEAN percent STDEV percent
> .dtb 2113609 81.7
made it to here
> .cts 0 0.0 2113608 81.7
> PROCESS DECISION percent %of-chk
> no lat/lon 0 0.0 0.0
> no normal 474422 18.3 18.3
> out-of-range 68179 2.6 3.2
> duplicated 923258 35.7 45.1
> accepted 1122172 43.4
> Dumping years 1901-1990 to .txt files...

The lats & lons look the same.. but a lot less duplicates!

WHY? Well, it could just be those pesky US stations.. so
why not compare the two bespoke log files (as excerpted above)?

Immediately, another baffler: the log file from the run of
the 'final' database has lots of 'DEBUG DETAIL' information,
but the log file from the run of the original database does not!
So cropping those away with a judicious 'tail'.. I ran comm:

crua6[/cru/cruts/version_3_0/primaries/temp] comm -23 log_anomdtb_H.0702091122.dat barelog_anomdtb_H.0704300053.dat |wc -l
200
crua6[/cru/cruts/version_3_0/primaries/temp] comm -13 log_anomdtb_H.0702091122.dat barelog_anomdtb_H.0704300053.dat | wc -l
2572
crua6[/cru/cruts/version_3_0/primaries/temp] comm -12 log_anomdtb_H.0702091122.dat barelog_anomdtb_H.0704300053.dat | wc -l
1809

So 200 duplication events are unique to the older database,
and 2572 are unique to the new database - with 1809 common
to both. A quick look at the 2572 'new' ones showed a majority
of those with the first WMO as -999: this is the key. The
databases do not have any records with WMO=-999 as far as I know,
so something is going on..

The HARRY_READ_ME.txt file

Part 27