So, anyway, back on Earth I wrote wmocmp.for, a program to - you guessed it - compare WMO codes from
a given set of databases. Results were, ah.. 'interesting':
<BEGIN QUOTE>
REPORT:
Database Title Exact Match Close Match Vague Match Awful Match Codes Added WMO = 0
../db/pre/pre.0612181221.dtb n/a n/a n/a n/a 14397 1540
../db/dtr/tmn.0708071548.dtb 1865 3389 57 77 5747 2519
../db/tmp/tmp.0705101334.dtb 0 4 28 106 4927 0
<END QUOTE>
So the largest database, precip, contained 14397 stations with usable WMO codes (and 1540 without).
The TMin, (and TMax and DTR, which were tested then excluded as they matched TMin 100%) database only agreed
perfectly with precip for 1865 stations, nearby 3389, believable 57, worrying 77. TMean fared worse, with NO
exact matches (WMO misformatting again) and over 100 worrying ones.
The big story is the need to fix the tmean WMO codes. For instance:
is illegal, and needs to become one of:
01001 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
0001001 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
0100100 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
I favour the first as it's technically accurate. Alternatively we seem to have widely adopted the third, which
at least has the virtue of being consistent. Of course it's the only one that will match the precip:
And the reason this is so important is that the incoming updates will rely PRIMARILY on matching the WMO codes!
In fact CLIMAT bulletins carry no other identification, of course. Clearly I am going to need a reference set
of 'qenuine WMO codes'.. and wouldn't you know it, I've found four!
Location N. Stations Notes
http://weather.noaa.gov/data/nsd_bbsss.txt 11548 Full country names, ';' delim
http://www.htw-dresden.de/~kleist/wx_stations_ct.html 13000+ *10, leading zeros kept, fmt probs
From Dave Lister 13080 *10 and leading zeros lost, country codes
From Philip Brohan 11894 2+3, No countries
The strategy is to use Dave Lister's list, grabbing country names from the Dresden list. Wrote
getcountrycodes.for and extracted an imperfect but useful-as-a-reference list. Hopefully in the main the country
will not need fixing or referring to!!
Wrote 'fixwmos.for' - probably not for the first time, but it's the first prog of that name in my repository so I'll
have to hope for the best. After an unreasonable amount of teething troubles (due to my forgetting that the tmp
database stores lats & lons in degs*100 not degs*10, and also to the presence of a '-99999' as the lon for GUATEMALA
in the reference set) I managed to sort-of fix the tmp database:
This is misleading because, although there probably won't BE any incoming updates for ISFJORD RADIO, we can't say for
certain that there will never be updates for any station outside the current reference set. In fact, we can say with
confidence that there will be!
So, what to do? Do we assume a particular factor to adjust ALL codes by, based on the matches? Or do we attempt (note
careful use of verb) to use the country codes database to work out the most significant 'real' digits of these codes?
Well, I fancy the first one. We'll make two passes through the data, the first pass changes nothing but saves counts of
the successful factors in bins: *0.01, *0.1, *1, *10, *100 should do it. I sure hope all the results are in one bin!
It worked. An initial 'verbose' run showed a consistent choice of factor, though it'll exit with an error code if multiple
factors are registered in one database.
Gotta love the system! Like this is ever going to be a blind bit of use. Modified the code to
leave such stations unmolested, but identified in a separate file so they can be 'cleansed', it
being a little too risky to auto-cleanse such things.
Enter the database to be fixed: wet.0311061611.dtb
The operation completed successfully.
1920 WMO Codes were 'matched'
All codes were modified with a factor of 10
Lons/lats were modified with a factor of 1
The output database is wet.0710021341.dtb
IMPORTANT: the following WMO codes were not altered:
False codes (wmo<0): 2917
Illegal codes (0<=wmo<1000): 1
(illegals written to wet.0311061611.bad)
crua6[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>
I then removed the sole illegal (see above) from wet.0710021341.dtb, which becomes the 'new old'
wet/rd0 database.
So.. to incorporate the updates! Finally. First, the MCDW, metadata-rich ones:
Before we get started, an important question:
If you are merging an update - CLIMAT, MCDW,
ian - do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?
Enter 'B' for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710021341.dtb
Please enter the Update Database name: rdy.0709111032.dtb
Reading in both databases..
Master database stations: 4987
Update database stations: 2407
Looking for WMO code matches..
* new header 0100100 7056 -840 9 JAN MAYEN NORWAY 1990 2007 -999 -999 *
2 reject(s) from update process 0710041559
Writing wet.0710041559.dtb
OUTPUT(S) WRITTEN
New master database: wet.0710041559.dtb
Update database stations: 2407
> Matched with Master stations: 1556
(automatically: 1556)
(by operator: 0)
> Added as new Master stations: 0
> Rejected: 2
Rejects file: rdy.0709111032.dtb.rejected
Note: IEEE floating-point exception flags raised:
Inexact; Invalid Operation;
See the Numerical Computation Guide, ieee_flags(3M)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>
(also knocked up rrstats.for at this stage, to analyse replication rates by
latitude band for a given database - needs a Matlab prog to drive really)
[a bit of debugging here as the last records weren't being written properly,
filenames adjusted above accordingly]
Then, the CLIMAT, nothing-but-the-code ones:
*WARNING: ignore this, the CLIMAT bulletins were later improved with metadata and newmergedb rerun*
Before we get started, an important question:
If you are merging an update - CLIMAT, MCDW, Australian - do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?
Enter 'B' for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710041559.dtb
Please enter the Update Database name: rdy.0709101706.dtb
Reading in both databases..
Master database stations: 5836
Update database stations: 2876
Looking for WMO code matches..
378 reject(s) from update process 0710081508
Writing wet.0710081508.dtb
OUTPUT(S) WRITTEN
New master database: wet.0710081508.dtb
Update database stations: 2876
> Matched with Master stations: 2498
(automatically: 2498)
(by operator: 0)
> Added as new Master stations: 0
> Rejected: 378
Rejects file: rdy.0709101706.dtb.rejected
Note: IEEE floating-point exception flags raised:
Inexact; Invalid Operation;
See the Numerical Computation Guide, ieee_flags(3M)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>
Now of course, we can't add any of the CLIMAT bulletin stations as 'new' stations
because we don't have any metadata! so.. is it worth using the lookup table? Because
although I'm thrilled at the high match rate (87%!), it does seem worse when you
realise that you lost the rest..
* see below, CLIMAT metadata fixed! *
At this stage I knocked up rrstats.for and the visualisation companion tool, cmprr.m. A simple process
to show station counts against time for each 10-degree latitude band (with 20-degree bands at the
North and South extremities). A bit basic and needs more work - but good for a quick & dirty check.
Wrote dllist2headers.for to convert the 'Dave Lister' WMO list to CRU header format - the main difficulty
being the accurate conversion of the two-character 'country codes' - especially since many are actually
state codes for the US! Ended up with wmo.0710151633.dat as our reference WMO set.
Incorporated the reference WMO set into climat2cru.for. Successfully reprocessed the CLIMAT bulletins
into databases with at least SOME metadata:
Before we get started, an important question:
If you are merging an update - CLIMAT, MCDW, Australian - do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?
Enter 'B' for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710041559.dtb
Please enter the Update Database name: rdy.0710151817.dtb
Reading in both databases..
Master database stations: 5836
Update database stations: 2876
Looking for WMO code matches..
71 reject(s) from update process 0710161148