The HARRY_READ_ME.txt file

Part 35d

So, anyway, back on Earth I wrote wmocmp.for, a program to - you guessed it - compare WMO codes from
a given set of databases. Results were, ah.. 'interesting':

<BEGIN QUOTE>
REPORT:

Database Title Exact Match Close Match Vague Match Awful Match Codes Added WMO = 0
../db/pre/pre.0612181221.dtb n/a n/a n/a n/a 14397 1540
../db/dtr/tmn.0708071548.dtb 1865 3389 57 77 5747 2519
../db/tmp/tmp.0705101334.dtb 0 4 28 106 4927 0
<END QUOTE>

So the largest database, precip, contained 14397 stations with usable WMO codes (and 1540 without).
The TMin, (and TMax and DTR, which were tested then excluded as they matched TMin 100%) database only agreed
perfectly with precip for 1865 stations, nearby 3389, believable 57, worrying 77. TMean fared worse, with NO
exact matches (WMO misformatting again) and over 100 worrying ones.

The big story is the need to fix the tmean WMO codes. For instance:

10010 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00

is illegal, and needs to become one of:
01001 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
0001001 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
0100100 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00

I favour the first as it's technically accurate. Alternatively we seem to have widely adopted the third, which
at least has the virtue of being consistent. Of course it's the only one that will match the precip:

100100 7093 -867 10 JAN MAYEN NORWAY 1921 2006 -999 -999.00

..which itself should be either:

0100100 7093 -867 10 JAN MAYEN NORWAY 1921 2006 -999 -999.00

or:

01001 7093 -867 10 JAN MAYEN NORWAY 1921 2006 -999 -999.00

Aaaaarrrggghhhh!!!!

And the reason this is so important is that the incoming updates will rely PRIMARILY on matching the WMO codes!
In fact CLIMAT bulletins carry no other identification, of course. Clearly I am going to need a reference set
of 'qenuine WMO codes'.. and wouldn't you know it, I've found four!

Location N. Stations Notes
http://weather.noaa.gov/data/nsd_bbsss.txt 11548 Full country names, ';' delim
http://www.htw-dresden.de/~kleist/wx_stations_ct.html 13000+ *10, leading zeros kept, fmt probs
From Dave Lister 13080 *10 and leading zeros lost, country codes
From Philip Brohan 11894 2+3, No countries

The strategy is to use Dave Lister's list, grabbing country names from the Dresden list. Wrote
getcountrycodes.for and extracted an imperfect but useful-as-a-reference list. Hopefully in the main the country
will not need fixing or referring to!!

Wrote 'fixwmos.for' - probably not for the first time, but it's the first prog of that name in my repository so I'll
have to hope for the best. After an unreasonable amount of teething troubles (due to my forgetting that the tmp
database stores lats & lons in degs*100 not degs*10, and also to the presence of a '-99999' as the lon for GUATEMALA
in the reference set) I managed to sort-of fix the tmp database:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/tmp] ./fixwmos

FIXWMOS - Fix WMO Codes in a Database

Enter the database to be fixed: tmp.0705101334.dtb

The operation completed successfully.

2263 WMO Codes were 'fixed' and all were rewritten as (i7.7)

The output database is tmp.0709281456.dtb

crua6[/cru/cruts/version_3_0/db/tmp]
<END QUOTE>

The first records have changed as follows:

crua6[/cru/cruts/version_3_0/db/tmp] diff tmp.0705101334.dtb tmp.0709281456.dtb |head -30
1c1
< 10010 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
---
> 0100100 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00

So far so good.. but records that weren't matched with the reference set didn't fare so well:

89c89
< 10050 780 142 9 ISFJORD RADIO NORWAY 1912 1979 101912 -999.00
---
> 0010050 780 142 9 ISFJORD RADIO NORWAY 1912 1979 101912 -999.00

This is misleading because, although there probably won't BE any incoming updates for ISFJORD RADIO, we can't say for
certain that there will never be updates for any station outside the current reference set. In fact, we can say with
confidence that there will be!

So, what to do? Do we assume a particular factor to adjust ALL codes by, based on the matches? Or do we attempt (note
careful use of verb) to use the country codes database to work out the most significant 'real' digits of these codes?

Well, I fancy the first one. We'll make two passes through the data, the first pass changes nothing but saves counts of
the successful factors in bins: *0.01, *0.1, *1, *10, *100 should do it. I sure hope all the results are in one bin!

It worked. An initial 'verbose' run showed a consistent choice of factor, though it'll exit with an error code if multiple
factors are registered in one database.

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/tmp] ./fixwmos

FIXWMOS - Fix WMO Codes in a Database

Enter the database to be fixed: tmp.0705101334.dtb
locfac set to: 10
First ref: 0100100

The operation completed successfully.

2263 WMO Codes were 'matched'
All codes were modified with a factor of 10
Lons/lats were modified with a factor of 10

The output database is tmp.0710011359.dtb

crua6[/cru/cruts/version_3_0/db/tmp]
<END QUOTE>

Example results:
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/tmp] diff tmp.0705101334.dtb tmp.0710011359.dtb | head -12
1c1
< 10010 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
---
> 0100100 7090 -870 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
89c89
< 10050 780 142 9 ISFJORD RADIO NORWAY 1912 1979 101912 -999.00
---
> 0100500 7800 1420 9 ISFJORD RADIO NORWAY 1912 1979 101912 -999.00
159c159
< 10080 783 155 28 Svalbard Lufthavn NORWAY 1911 2006 341911 -999.00
---
> 0100800 7830 1550 28 Svalbard Lufthavn NORWAY 1911 2006 341911 -999.00
<END QUOTE>

Then.. attacked the wet database! And immediately found this beauty:

0 -9999 -99999 -999 UNKNOWN UNKNOWN 1994 2003 -999 0
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1994 500 800 600 400 600 100 0 100 200 400 1000 1300
1995 400 100 1100 900 1200 800 200 100 200 400 800 500
1996 500 1100 1500 600 900-9999 0 300 400 700 0 1100
1997 800 1000 700 1000 1000 1000 200 200 400 700 200 1000
1998 700 700 1000 1000-9999 800 100 100 0 200 400 700
1999 300 1000 800-9999 700 800 0 200-9999 600 400 200
2000 1100 600 900 900 1000 400-9999 100 200 300 0 400
2001 0 800 300 500 1200 0 0 0 200 200 500 800
2002 800 300 600 1300 800 500 400 100 300 400 400 600
2003 300-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

Gotta love the system! Like this is ever going to be a blind bit of use. Modified the code to
leave such stations unmolested, but identified in a separate file so they can be 'cleansed', it
being a little too risky to auto-cleanse such things.

Hopefully the final attack on 'wet':

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/rd0] ./fixwmos

FIXWMOS - Fix WMO Codes in a Database

Enter the database to be fixed: wet.0311061611.dtb

The operation completed successfully.

1920 WMO Codes were 'matched'
All codes were modified with a factor of 10
Lons/lats were modified with a factor of 1

The output database is wet.0710021341.dtb


IMPORTANT: the following WMO codes were not altered:
False codes (wmo<0): 2917
Illegal codes (0<=wmo<1000): 1
(illegals written to wet.0311061611.bad)
crua6[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>

I then removed the sole illegal (see above) from wet.0710021341.dtb, which becomes the 'new old'
wet/rd0 database.

So.. to incorporate the updates! Finally. First, the MCDW, metadata-rich ones:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update - CLIMAT, MCDW,
ian - do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter 'B' for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710021341.dtb
Please enter the Update Database name: rdy.0709111032.dtb

Reading in both databases..
Master database stations: 4987
Update database stations: 2407

Looking for WMO code matches..
* new header 0100100 7056 -840 9 JAN MAYEN NORWAY 1990 2007 -999 -999 *
2 reject(s) from update process 0710041559

Writing wet.0710041559.dtb

OUTPUT(S) WRITTEN

New master database: wet.0710041559.dtb

Update database stations: 2407
> Matched with Master stations: 1556
(automatically: 1556)
(by operator: 0)
> Added as new Master stations: 0
> Rejected: 2
Rejects file: rdy.0709111032.dtb.rejected
Note: IEEE floating-point exception flags raised:
Inexact; Invalid Operation;
See the Numerical Computation Guide, ieee_flags(3M)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>

(also knocked up rrstats.for at this stage, to analyse replication rates by
latitude band for a given database - needs a Matlab prog to drive really)

[a bit of debugging here as the last records weren't being written properly,
filenames adjusted above accordingly]


Then, the CLIMAT, nothing-but-the-code ones:

*WARNING: ignore this, the CLIMAT bulletins were later improved with metadata and newmergedb rerun*

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update - CLIMAT, MCDW, Australian - do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter 'B' for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710041559.dtb
Please enter the Update Database name: rdy.0709101706.dtb

Reading in both databases..
Master database stations: 5836
Update database stations: 2876

Looking for WMO code matches..
378 reject(s) from update process 0710081508

Writing wet.0710081508.dtb

OUTPUT(S) WRITTEN

New master database: wet.0710081508.dtb

Update database stations: 2876
> Matched with Master stations: 2498
(automatically: 2498)
(by operator: 0)
> Added as new Master stations: 0
> Rejected: 378
Rejects file: rdy.0709101706.dtb.rejected
Note: IEEE floating-point exception flags raised:
Inexact; Invalid Operation;
See the Numerical Computation Guide, ieee_flags(3M)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>

Now of course, we can't add any of the CLIMAT bulletin stations as 'new' stations
because we don't have any metadata! so.. is it worth using the lookup table? Because
although I'm thrilled at the high match rate (87%!), it does seem worse when you
realise that you lost the rest..

* see below, CLIMAT metadata fixed! *

At this stage I knocked up rrstats.for and the visualisation companion tool, cmprr.m. A simple process
to show station counts against time for each 10-degree latitude band (with 20-degree bands at the
North and South extremities). A bit basic and needs more work - but good for a quick & dirty check.

Wrote dllist2headers.for to convert the 'Dave Lister' WMO list to CRU header format - the main difficulty
being the accurate conversion of the two-character 'country codes' - especially since many are actually
state codes for the US! Ended up with wmo.0710151633.dat as our reference WMO set.

Incorporated the reference WMO set into climat2cru.for. Successfully reprocessed the CLIMAT bulletins
into databases with at least SOME metadata:

pre.0710151817.dtb
rdy.0710151817.dtb
sun.0710151817.dtb
tmn.0710151817.dtb
tmp.0710151817.dtb
tmx.0710151817.dtb
vap.0710151817.dtb

In fact, it was far more successful than I expected - only 11 stations out of 2878 without metadata!

Re-ran newmergedb:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update - CLIMAT, MCDW, Australian - do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter 'B' for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710041559.dtb
Please enter the Update Database name: rdy.0710151817.dtb

Reading in both databases..
Master database stations: 5836
Update database stations: 2876

Looking for WMO code matches..
71 reject(s) from update process 0710161148

Writing wet.0710161148.dtb

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

OUTPUT(S) WRITTEN

New master database: wet.0710161148.dtb

Update database stations: 2876
> Matched with Master stations: 2498
(automatically: 2498)
(by operator: 0)
> Added as new Master stations: 307
> Rejected: 71
Rejects file: rdy.0710151817.dtb.rejected
Note: IEEE floating-point exception flags raised:
Inexact; Invalid Operation;
See the Numerical Computation Guide, ieee_flags(3M)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>

307 stations rescued! and they'll be there in future of course, for metadata-free CLIMAT bulletins
to match with.


Go on to part 35e, back to index or Email search