The HARRY_READ_ME.txt file

Part 32

32. The next stage *heart falls* will be to synchronise tmax *against* tmin, sweeping
up duplicates in the process. How long's THIS gonna take? Well actually, it might be fairly easy,
if we use a similar approach. We can base it all around the user being given a 'cloud' of
related stations to pick pairs from, only they will be uniquely numbered so that two from the
same database can be selected. The user can in this way 'pair up' stations in groups.

Of course, this comes with the downside of complexity (and therefore bugs). And both databases
will almost certainly have to be preloaded in their entirety because of the need for the user to
be able to confirm header and data precedence info when stations within a database are merged.

Oh - and I'll have to move bloody quick. So more bugs.

Well.. it's written, and debugging. Around 1500 lines of code, or 1000 without all the comments ;-)
It does indeed read in all the data, so has to be compiled on uealogin1 (as crua6 doesn't have
enough memory!). Reusing code from auminmatch.for did speed things up a bit, though two new
subroutines had to be written to carry out checking for merges (within a database) and for
matches (between the databases). Also introduced a user decision at the start to allow the TMin
database to take precedence in terms of station metadata. Here's the current state of play:


uealogin1[/cru/cruts/version_3_0/db/dtr] ./auminmaxsync

WELCOME TO THE TMIN/TMAX SYNCHRONISER

Before we get started, an important question: Should TMin header info take precedence over TMax?

This will significantly reduce user decisions later, but is a big step as TMax settings may be silently overridden!

To let TMin header values take precedence over those of TMax, enter 'YES': YES
Please enter the tmin database name: tmn.0707021605.dtb
Please enter the tmax database name: tmx.0702091313.dtb

Reading in both databases..
TMin database stations: 14349
TMax database stations: 14315

Processing one-to-one matches..

Initial scan found:
one-to-one matches: 7875
of which confirmed: 7691
in a station cloud: 6411 (tmin)
in a station cloud: 6392 (tmax)
unmatchable: 63 (tmin)
unmatchable: 48 (tmax)
Processing match clouds..
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations: 2
1. -401000 3178 3522 783 JERUSALEM 1863 2000 -999 0
2. 4018400 3178 3522 809 JERUSALEM 1977 1995 -999 0
TMax stations: 2
3. -401000 3178 3522 783 JERUSALEM 1863 2000 -999 0
4. 4018400 3178 3522 809 JERUSALEM 1977 1995 -999 0

*** Remember: Merge first, Match second! ***
Enter ANY pair to match or merge, or 'n' to end:


So stats pretty much as expected/hoped. The one-to-one matches should, of course, be 100%.. but as
the databases aren't synchronised, and as there are hundreds of 'duplicate' entries.. only around
50% match straight away. The situation isn't as bleak as it looks, though - there is further
automatching at the beginning of each cloud, so the user can still be spared the obvious. If the
merging gets too onerous, though, I might have to automate that - with associated risks.

And of course - if you look closely - things are still a little offbeam :-/

Found another database bug by chance.. a instead of a space after 'CRANWELL':

-324320 5303 -50 62 CRANWELL UK 1961 1995 -999 -999.00

Doesn't show up in reads as it's a white space character. Argh. Fixed in tmin & tmax. Now to find
out why some matched stations STILL don't have the backref in the last header field!! ..found it,
not my problem, it's the ones that *pre-existed* in the databases, there's 84 in total I think. So
I can write a proglet to check that any with negative WMO codes have the positive version in that
last field. And I did - 'fixtnxrefs.for'. Fixed:
tmn.0702091139.dtb (84 fixed)
tmn.0707021605.dtb (651 'fixed' - includes all with negative WMOs regardless of end field)
tmx.0702091313.dtb (84 fixed)

So why, when we matched 758 bulletins in the first place, did this program only 'fix' 651, of which
84 were preexisting? Because, of course, the matches only get a negative WMO code if the original
WMO code is missing (zero). The 'missing' stations would be ones that already had a WMO code.

So, try again, and it's looking good!


uealogin1[/cru/cruts/version_3_0/db/dtr] ./auminmaxsync

WELCOME TO THE TMIN/TMAX SYNCHRONISER

Before we get started, an important question: Should TMin header info take precedence over TMax?

This will significantly reduce user decisions later, but is a big step as TMax settings may be silently overridden!

To let TMin header values take precedence over those of TMax, enter 'YES': YES
Please enter the tmin database name: tmn.0702091139.dtb
Please enter the tmax database name: tmx.0702091313.dtb

Reading in both databases..
TMin database stations: 14309
TMax database stations: 14315

Processing one-to-one matches..

Initial scan found:
one-to-one matches: 7889
of which confirmed: 7702
in a station cloud: 6365 (tmin)
in a station cloud: 6378 (tmax)
unmatchable: 55 (tmin)
unmatchable: 48 (tmax)
Processing match clouds..
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations: 2
1. -401000 3178 3522 783 JERUSALEM 1863 2000 -999 401000
2. 4018400 3178 3522 809 JERUSALEM 1977 1995 -999 0
TMax stations: 2
3. -401000 3178 3522 783 JERUSALEM 1863 2000 -999 401000
4. 4018400 3178 3522 809 JERUSALEM 1977 1995 -999 0

*** Remember: Merge first, Match second! ***
Enter ANY pair to match or merge, or 'n' to end: 1,2
Merging two stations from the TMin database:
Stn 1: -401000 3178 3522 783 JERUSALEM ISRAEL 1863 2000 -999 401000
Stn 2: -401000 3178 3522 783 JERUSALEM ISRAEL 1863 2000 -999 401000
Please resolve the following inconsistencies:
Overlap: Station A) -401000 3178 3522 783 JERUSALEM ISRAEL 1863 2000 -999 401000
Station B) 4018400 3178 3522 809 JERUSALEM ISRAEL 1977 1995 -999 -999.00

You must decide which station's data takes precedence.
The intercorrelation for the period is: 0.99
Enter A or B, or undo pair(X):



Well.. it's kinda working. I found some idiotic bugs, though it is a fearsomely complicated program with
lots of indirect pointers (though I do try and resolve them at the first opportunity). One thing that's
making debugging frustratingly difficult is something that must be a uealogin1 feature, and I haven't seen
it before: the program doesn't actually flush the output channels whenever you write! For example, as I
write this the program has dispensed with auto-matching:

Initial scan found:
one-to-one matches: 7875
of which confirmed: 7691
in a station cloud: 6411 (tmin)
in a station cloud: 6392 (tmax)
unmatchable: 63 (tmin)
unmatchable: 48 (tmax)

(yes, it's a little tighter now)

Anyway, since then I've merged two pair (JERUSALEM) then paired the remainder. That activity has generated
match reports on channel 31 BUT THEY ARE NOT IN THE FILE YET. Here is the tail of channel 31:

crua6[/cru/cruts/version_3_0/db/dtr] tail mat.0707121500.dat
TMax: 9929470 4330 1340 342 MACERATA ITALY 1953 1975 -999 -999.00
AUTO PAIRING FROM ONE-TO-ONE SCAN:
TMin: 9929480 4030 880 585 MACOMER ITALY 1952 1978 -999 -999.00
TMax: 9929480 4030 880 585 MACOMER ITALY 1952 1978 -999 -999.00
AUTO PAIRING FROM ONE-TO-ONE SCAN:
TMin: 9929500 4010 1850 86 PALASCIA AERO ITALY 1952 1978 -999 -999.00
TMax: 9929500 4010 1850 86 PALASCIA AERO ITALY 1952 1978 -999 -999.00
AUTO PAIRING FROM ONE-TO-ONE SCAN:
TMin: 9929520 4060 1490 30 PONTECAGNANO ITALY 1951 1978 -999 -999.00
TMax: 9929520 4060 1490 30 PONTECAGNANO ITALY 1951 1978 -999 -999.00

In addition, the log file is EMPTY, yet at least 416 bytes have been written to it. How the hell can I
debug if I can't monitor what's being written to the log files?!! Of course, once I force-quit the program,
and wait a bit.. the missing info appears. Similarly if I carry on using the program, the files get more
info. It's as if there's a write buffer that runs FIFO. Must look at the 'help'.. why is it that whenever I
crack the programming, the systems themselves step in the screw it up? And computer support is away of course.

Looked at f77 -help.. nothing. well nothing obvious. Anyway, more debugging and..

Seems to be working. But it's going to take ages. Here is an example of the problem:


-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations: 2
1. -315770 5638 -287 10 LEUCHARS UK 1959 1995 -999 315770
2. 317100 5640 -287 12 LEUCHARS UNITED KINGDO 1997 2006 -999 0
TMax stations: 2
3. -315770 5638 -287 10 LEUCHARS UK 1959 1995 -999 315770
4. 317100 5638 -287 12 LEUCHARS RAF UK UK 1973 2006 -999 0

*** Remember: Merge first, Match second! ***
Enter ANY pair to match or merge, or 'n' to end:


Not only do both databases have unnecessary duplicates, introduced for external mapping purposes
by the look of it, but the 'main' stations (2 and 4) have different station name & country. In fact
one of the country names is illegal! Dealing with things like this cannot be automated as they're
the results of non-automatic decisions.

Something new - a listing of 147 Australian 'bulletin' stations, most of which have mappings to
WMO codes. Decided to xref against the (mapped) TMin database, for a laugh. Then decided to take it
more seriously. Wrote a prog to IMPOSE the mappings onto tmn.0707021605.dtb, overriding existing
mappings as necessary. What a bloody mess.

Decided to be vaguely sensible and let the program, auwmoxref.for, evolve. so to begin with it just
did a scan between the mappings file (au_mapping_to_wmo.dat) and the tmin database with my mappings
in (tmn.0707021605.dtb). Results:

crua6[/cru/cruts/version_3_0/db/dtr] ./auwmoxref


AUWMOXREF: Check Australian cross-references

Enter the file of WMO mappings: au_mapping_to_wmo.dat
115 mappings read

Enter the mapped TMin database: tmn.0707021605.dtb
14349 database headers read


RESULTS:

WMO Matches: 92
(multiples) ( 0)
> Ref matches: 60
> Ref empty: 31
> Ref WRONG: 1

Ref Matches: 114
(multiples) ( 0)
> WMO matches: 60
> WMO -1*Ref: 41
> WMO WRONG: 13


So first the good news - no duplicates. Well there shouldn't have been any anyway of course, but the
way things are going I'm taking nothing for granted. See, I count something turning out as expected
as 'good news'. So anyway.. I also extracted the statistic that 26 mappings matched both Ref and WMO,
but to separate database entries. Thus the 115 mappings are allocated as follows:

60 Mapping found to be correctly implemented (over half, excellent)
41 WMO Missing, of which:
26 WMO found elsewhere (one of which has an unmapped ref attached to it)
15 WMO not in database (can add wmo codes for these)
13 WMO wrong, of which:
5 Can be merged with real WMO (effectively same station)
8 WMO not in database
1 Completely unmatched (96003 -> 949500)

For the purposes of actions to take, the 13 'WMO Wrong' refs can simply be unmapped from their incorrect
mappings and be rolled into the 41 'WMO Missing' refs.

So:

60 Mapping found to be correctly implemented (over half, excellent)
54 WMO Missing or wrong, of which:
31 WMO found elsewhere (one of which has an unmapped ref attached to it)
23 WMO not in database but pairing made (can add wmo codes for these)
8 WMO not in database and no pairing (can add new stations for these)
1 Completely unmatched (96003 -> 949500)

So, actions to take:

1. For the first 60, no action required.
2. For the 13 with incorrectly-assigned WMOs, disengage and roll into the rest below
3. For the 1 WMO with an unmapped ref attached, disengage and roll into the rest below
3. For the 31 with dislocated WMOs, print a list and ref when doing the tmin/tmax syncing
4. For the 23 with WMO-less stations, add the WMO codes..
5. For the 8 with no WMO found and no pairing found, create new stations.

For the disengagements, decided to work directly with an editor rather than craft another program! So
changes made to tmn.0707021605.dtb (after a suitable backup was made of course!).

The following assignments were disengaged (and replaced with -999.00). Where a WMO code follows in
brackets, the ref was reassigned there.

1. 9460300 -3200 11550 43 ROTTNEST ISLAND AUSTRALIA 1898 2006 -999 9193 (9460200)
2. 9464600 -3090 12810 159 FORREST AUSTRALIA 1946 2006 -999 11052 (no)
3. 9432200 -2020 13000 340 RABBIT FLAT AUSTRALIA 1969 2006 -999 15666 (no)
4. 9557400 -2640 15300 6 TEWANTIN RSL PARK AUSTRALIA 1949 2006 -999 40908 (no)
5. 9451600 -2810 14860 199 ST GEORGE AIRPORT AUSTRALIA 1938 2006 -999 43109 (9451700)
6. 9452700 -2950 14990 213 MOREE AERO AUSTRALIA 1964 2006 -999 53115 (9552700)
7. 9454100 -2980 15110 582 INVERELL (RAGLAN ST) AUSTRALIA 1907 2006 -999 56242 (no)
8. 9478700 -3140 15290 4 PORT MACQUARIE AIRPO AUSTRALIA 1907 2006 -999 60139 (no)
9. 9475800 -3210 15090 216 SCONE SCS AUSTRALIA 2000 2006 -999 61089 (9473800)
10. 9494000 -3510 15080 85 JERVIS BAY (POINT PE AUSTRALIA 1907 2006 -999 68151 (no)
11. 9491600 -3590 14840 1482 CABRAMURRA SMHEA AWS AUSTRALIA 1962 2006 -999 72161 (no)
12. 9482700 -3630 14160 133 NHILL AUSTRALIA 1897 2006 -999 78031 (9582900)
13. 9597900 -4300 14710 63 GROVE (COMPARISON) AUSTRALIA 1961 2006 -999 94069 (no)

The 'mismatched WMO code' station was disengaged from it's reference and given 48027 instead:
1. 9471100 -3150 14580 218 COBAR AIRPORT AWS AUSTRALIA 1962 2006 -999 48237 -> 48027

I mailed BOM as we have 94711 = COBAR AWS but they have *94710* for AWS and 94711 for COBAR MO. The
reply was as follows:


On 18 Jul 2007, at 8:51, Matthew Bastin wrote:

Hi Ian,

I hope this table helps

Name BoM No. WMO No. Opened Closed
Cobar Comparison 48244 94711 1/11/1997 15/11/2000
Cobar MO 48027 94711 1/01/1962
Cobar Airport AWS 48237 94710 11/06/1993
Cobar PO 48030 1/1/1881 31/12/1965

The blank in the Closed column means that the site is still open
When Cobar Comparison site closed it transferred its WMO number to Cobar MO
A blank in the WMO No. column means that the site never had a WMO number.

I am not sure of the overlap between the assignment of 94711 between 48244 and 48027. I will find
out and get back to you.


Here are our current 'COBAR' headers:

0 -3150 14580 260 COBAR COMPARISON AUSTRALIA 2000 2006 -999 -999.00
0 -3150 14580 260 COBAR MO AUSTRALIA 2000 2006 -999 -999.00
0 -3148 14582 265 COBAR AUSTRALIA 1962 2004 -999 -999.00
0 -3150 14580 251 COBAR POST OFFICE AUSTRALIA 1902 1960 -999 -999.00
9471100 -3150 14580 218 COBAR AIRPORT AWS AUSTRALIA 1962 2006 -999 48027

Now looking at the dates.. something bad has happened, hasn't it. COBAR AIRPORT AWS cannot start
in 1962, it didn't open until 1993! Looking at the data - the COBAR station 1962-2004 seems to be
an exact copy of the COBAR AIRPORT AWS station 1962-2004, except that the latter has more missing
values. Now, COBAR AIRPORT AWS has 15 months of missing value codes beginning Oct 1993.. coincidence?
No. I think that that series should start there. Furthermore, the overlap between COBAR and COBAR MO
(2000-2004) is *almost* identical:

0 -3148 14582 265 COBAR AUSTRALIA 1962 2004 -999 -999.00
2000 177 209 183 135 80 51 45 52 105 122 166 186
2001 223 214 159 126 72 61 43 52 105 110 148 181
2002 195 185 168 148 88 58 49 63 101 128 186 192
2003 222 216 161 137 97 71 56 61 92 113 159 208
2004 207 226 175 141 74 69 46 69 90 136 160 186


0 -3150 14580 260 COBAR MO AUSTRALIA 2000 2006 -999 -999.00
2000 178 209 184 136 80 52 45 55 105 122 166 186 (7/12)
2001 223 214 159 126 72 61 43 52 105 110 148 181 (12/12)
2002 195 185 168 148 88 58 49 63 101 128 187 192 (11/12)
2003 222 216 161 137 97 71 56 61 92 113 159 208 (12/12)
2004 207 226 175 141 74 69 46 69 90 136 160 186 (12/12)

I therefore propose to extend COBAR MO using COBAR, and to truncate COBAR AIRPORT AWS at 1993.
All BOM codes will be appended for completeness. So the new headers (with lat/lon from BOM too) are:

0 -3149 14583 260 COBAR COMPARISON AUSTRALIA 2000 2006 -999 48244 (closed)
9471100 -3149 14583 260 COBAR MO AUSTRALIA 1962 2006 -999 48027
0 -3150 14583 251 COBAR POST OFFICE AUSTRALIA 1902 1960 -999 48030 (closed)
9471000 -3154 14580 218 COBAR AIRPORT AWS AUSTRALIA 1995 2006 -999 48237

Deleted:
0 -3148 14582 265 COBAR AUSTRALIA 1962 2004 -999 -999.00

The remaining 26 dislocated references were reassigned as for the 13 above. Legitimate mappings:

1. 3003 9420300
2. 4032 9431200
3. 5007 9430200
4. 7176 9431700
5. 9021 9461000
6. 14508 9415000
7. 14932 9413100
8. 17031 9448000
9. 22801 9480500
10. 26026 9481200
11. 27045 9417000
12. 32040 9429400
13. 40842 9457800
14. 50052 9470700
15. 55024 9474000
16. 67105 9575300
17. 68072 9475000
18. 71041 9590800
19. 86282 9486600
20. 200283 9429900
21. 200288 9499600
22. 200790 9699500
23. 200839 9499500
24. 300000 8957100
25. 300001 8956400
26. 300017 8961100

WMO codes were added to these uncoded sites as shown:

1. 9410000 -1430 12670 23 KALUMBURU AUSTRALIA 2000 2006 -999 1019
2. 9562500 -3160 11720 217 CUNDERDIN AIRFIELD AUSTRALIA 2000 2006 -999 10286
3. 9564000 -3270 11670 275 WANDERING AUSTRALIA 2000 2006 -999 10917
4. 9567000 -3380 13820 109 SNOWTOWN (RAYVILLE P AUSTRALIA 2000 2006 -999 21133
5. 9481400 -3530 13890 58 STRATHALBYN RACECOUR AUSTRALIA 2000 2006 -999 24580
6. 9548200 -2590 13940 47 BIRDSVILLE AIRPORT AUSTRALIA 2000 2006 -999 38026
7. 9552900 -2670 15020 305 MILES CONSTANCE STRE AUSTRALIA 2000 2006 -999 42112
8. 9549200 -2800 14380 132 THARGOMINDAH AIRPORT AUSTRALIA 2000 2006 -999 45025
9. 9578400 -3190 15250 8 TAREE AIRPORT AWS AUSTRALIA 2000 2006 -999 60141
10. 9571900 -3220 14860 284 DUBBO AIRPORT AWS AUSTRALIA 2000 2006 -999 65070
11. 9586900 -3560 14500 94 DENILIQUIN AIRPORT A AUSTRALIA 2000 2006 -999 74258
12. 9495400 -4070 14470 94 CAPE GRIM BAPS AUSTRALIA 2000 2006 -999 91245
13. 9596400 -4110 14680 3 LOW HEAD AUSTRALIA 2000 2006 -999 91293
14. 9595900 -4190 14670 1055 LIAWENEE AUSTRALIA 2000 2006 -999 96033

The following was corrected (ref had been mistyped as 78013):
1. 9582900 -3783 14206 200 HAMILTON RESEARCH ST AUSTRALIA 1971 1998 -999 78031

Now the results look like this:

WMO Matches: 106
> Ref matches: 106
> Ref empty: 0
> Ref WRONG: 0
Ref Matches: 106
> WMO matches: 106
> WMO -1*Ref: 0
> WMO WRONG: 0

In other words, there are (115-106=) 9 mappings unfulfilled. The ref hasn't been matched and
WMO code isn't in the database. However, that didn't mean they weren't in the database with a
missing WMO code, did it? The following were found and augmented with both WMO code and ref.

9457000 -2639 15304 6 TEWANTIN RSL PARK AUSTRALIA 2000 2004 -999 40908
9594000 -3509 15080 85 JERVIS BAY (PT PERP AWS) AUSTRALIA 2000 2006 -999 68151

The following were added as new station stubs:
9532200 -2018 13001 340 RABBIT FLAT AUSTRALIA 2007 2007 -999 15666
9554100 -2978 15111 582 INVERELL (RAGLAN ST) AUSTRALIA 2007 2007 -999 56242
9478600 -3143 15287 4 PORT MACQUARIE AIRPT AUSTRALIA 2007 2007 -999 60139
9591600 -3594 14838 1482 CABRAMURRA SMHEA AWS AUSTRALIA 2007 2007 -999 72161
9597100 -4298 14708 63 GROVE (COMPARISON) AUSTRALIA 2007 2007 -999 94069

The following was complicated by the fact that two versions of the station appear to have been
concatenated. This is the station as it already exists in the TMin database:
9464600 -3085 12811 159 FORREST AUSTRALIA 1946 2006 -999 -999.00
However, the current 'live' FORREST station (11052) started in 1993, according to bom.au
records. And wouldn't you know it, the data for this station has missing data between 12/92
and 12/99 inclusive. So I reckon it's the old FORREST AERO station (WMO 9464600, .au ID 11004),
with the new Australian bulletin updates tacked on (hence starting in 2000). Especially as the
old station started in 1946 (http://www.bom.gov.au/climate/averages/tables/cw_011004.shtml).
The trouble is that the bom.au mappings all agree that FORREST is now WMO=9564600. So.. do I
split off the 2000-present data to a new station with the new number, or accept that whoever
joined them (Dave?) looked into it and decided it would be OK? The BOM website says they're
800m apart. Decided to be brave and split the data back into two stations, with both codes
attached (in case we ever get replacement data for the closed station, the site says it went to
1995 after all). So there are now two FORREST stations:

9464600 -3085 12811 159 FORREST AERO AUSTRALIA 1946 1992 -999 11004
9564600 -3085 12811 159 FORREST AUSTRALIA 2000 2006 -999 11052

Hope that's right..

The following mapping was added, though the station does not currently feature in the bulletins.
9495900 -4228 14628 -999 BUTLERS GORGE AUSTRALIA 2007 2007 -999 96003
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2007-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

Also ran a risky search&replace to left-justify the 'AUSTRALIA' in its field, provided the
field wasn't touched by an extended station name. Seems to have been 100% successful.

All 115 refs now matched in the TMin database. Confidence in the fidelity of the Australian
station in the database drastically reduced. Likelihood of invalid merging of Australian
stations high. Let's go..

Well OK, made some final 'improvements' to the syncing program. Now, after it forms a cloud, it
should automatically merge stations provided the criteria are met and no others are possibles.
It also records, in a separate 'action' file (act.*), every relevant action performed during
the run, so that if interrupted I should be able to hack in something to enable a 'resume'. It's
been done a bit hastily so no guarantees that enough information's been saved!

Debugging is still a big issue, unfortunately. It's a complicated program to sort out and the
possibilities for indexing errors are many. In fact, for the first time ever, it's just locked up!
That's a first (it was due to getmos not defaulting to months 1 & 12 if the data was all missing).

Another problem solved - spent ages wondering how the start & end years for a particular station
(WARATAH) were being corrupted. Turns out they weren't - I'd written 'getmos' to trim empty years,
but forgot to check the return flag! Duh.

So.. perhaps a debugged run through? I'm quickly realising that the Australian stations are in
such a state that I'm having to constantly refer to the station descriptions on the BOM website,
which are individual PDFs:

http://www.bom.gov.au/climate/cdo/metadata/pdf/metadata088110.pdf

It takes time.. time I don't have! Though I'm pleased to see that the second FSM is helpfully
chipping in to pair things up when possible.

getting seriously fed up with the state of the Australian data. so many new stations have been
introduced, so many false references.. so many changes that aren't documented. Every time a
cloud forms I'm presented with a bewildering selection of similar-sounding sites, some with
references, some with WMO codes, and some with both. And if I look up the station metadata with
one of the local references, chances are the WMO code will be wrong (another station will have
it) and the lat/lon will be wrong too. I've been at it for well over an hour, and I've reached
the 294th station in the tmin database. Out of over 14,000. Now even accepting that it will get
easier (as clouds can only be formed of what's ahead of you), it is still very daunting. I go
on leave for 10 days after tomorrow, and if I leave it running it isn't likely to be there when
I return! As to whether my 'action dump' will work (to save repetition).. who knows?

Yay! Two-and-a-half hours into the exercise and I'm in Argentina!

Pfft.. and back to Australia almost immediately :-( .. and then Chile. Getting there.

Unfortunately, after around 160 minutes of uninterrupted decision making, my screen has started
to black out for half a second at a time. More video cable problems - but why now?!! The count is
up to 1007 though.

I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as
Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO
and one with, usually overlapping and with the same station name and very similar coordinates. I
know it could be old and new stations, but why such large overlaps if that's the case? Aarrggghhh!
There truly is no end in sight. Look at this:


-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations: 4
1. 0 153 12492 80 MENADO/DR. SA INDONESIA 1960 1975 -999 0
2. 0 153 12492 80 MENADO/ SAM RATULANG INDONESIA 1986 2004 -999 0
4. 9701400 153 12492 80 MENADO/DR. SAM RATUL INDONESIA 1995 2006 -999 0
5. 9997418 153 12492 81 SAMRATULANGI MENADO INDONESIA 1973 1989 -999 0
TMax stations: 4
6. 0 153 12492 80 MAPANGET/MANADO INDONESIA 1960 1975 -999 0
7. 0 153 12492 80 MENADO/ SAM RATULANG ID ID 1957 2004 -999 0
9. 9701400 153 12492 80 MENADO/DR. SAM RATUL INDONESIA 1995 2006 -999 0
10. 9997418 153 12492 81 SAMRATULANGI MENADO INDONESIA 1972 1989 -999 0

*** Remember: Merge first, then Match ***
Enter ANY pair to match or merge, 'a' to auto-match (no merges), or 'x' to end:

I honestly have no idea what to do here. and there are countless others of equal bafflingness.

I'll have to go home soon, leaving it running and hoping none of the systems die overnight :-(((

.. it survived, thank $deity. And a long run of duplicate stations, each requiring multiple
decisions concerning spatial info, exact names, and data precedence for overlaps. If for any reason
this has to be re-run, it can certainly be speeded up! Some large clouds, too - this one started
with 59 members from each database:

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations: 7
11. 7101965 4362 -7940 78 TORONTO ISLAND 1905 1959 -999 0
14. 7163427 4363 -7940 77 TORONTO ISLAND A CANADA 1957 1994 -999 0
23. 7101987 4380 -7955 194 TORONTO MET RES STN 1965 1988 -999 0
24. 7163434 4380 -7955 194 TORONTO MET RES STN CANADA 1965 1988 -999 0
36. 0 4388 -7944 233 RICHMOND HILL 1959 2003 -999 0
39. 7163408 4388 -7945 233 RICHMOND HILL CANADA 1959 1990 -999 0
40. 7163409 4387 -7943 218 RICHMOND HILL WPCP 1960 1981 -999 0
TMax stations: 8
70. 7101965 4362 -7940 78 TORONTO ISLAND 1905 1959 -999 0
71. 7126500 4363 -7940 77 TORONTO ISLAND A 1957 1994 -999 0
73. 7163427 4363 -7940 77 TORONTO ISLAND A CANADA 1957 1990 -999 0
82. 7101987 4380 -7955 194 TORONTO MET RES STN 1965 1988 -999 0
83. 7163434 4380 -7955 194 TORONTO MET RES STN CANADA 1965 1988 -999 0
95. 0 4388 -7944 233 RICHMOND HILL 1959 2003 -999 0
98. 7163408 4388 -7945 233 RICHMOND HILL CANADA 1959 1990 -999 0
99. 7163409 4387 -7943 218 RICHMOND HILL WPCP 1960 1981 -999 0

There were even larger clouds later.

One thing that's unsettling is that many of the assigned WMo codes for Canadian stations do
not return any hits with a web search. Usually the country's met office, or at least the
Weather Underground, show up - but for these stations, nothing at all. Makes me wonder if
these are long-discontinued, or were even invented somewhere other than Canada! Examples:

7162040 brockville
7163231 brockville
7163229 brockville
7187742 forestburg
7100165 forestburg

Here's a heartwarming example of a cloud which self-paired completely (debug ines included):


DBG: cloud formed with ( 6, 6) members
DBG: automerging done, leaving ( 6, 6)
DBG: pot.auto i,j: 1 1
DBG: i,ncs2m,cs2m(1-5): 1 1 1 8578 8582 8596 0
DBG: paired: 1 1 108 MILE HOUSE ABEL

Attempting to pair stations:
From TMin: 0 5170 -12140 994 108 MILE HOUSE ABEL 1987 2002 -999 -999.00
From TMax: 0 5170 -12140 994 108 MILE HOUSE ABEL 1987 2002 -999 -999.00
DBG: AUTOPAIRED: 1 1
DBG: pot.auto i,j: 2 2
DBG: i,ncs2m,cs2m(1-5): 2 1 2 8578 8582 8596 0
DBG: paired: 2 2 100 MILE HOUSE

Attempting to pair stations:
From TMin: 7194273 5165 -12130 1059 100 MILE HOUSE CANADA 1970 1999 -999 -999.00
From TMax: 7194273 5165 -12130 1059 100 MILE HOUSE CANADA 1970 1999 -999 -999.00
DBG: AUTOPAIRED: 2 2
DBG: pot.auto i,j: 3 3
DBG: i,ncs2m,cs2m(1-5): 3 1 3 8578 8582 8596 0
DBG: paired: 3 3 HORSE LAKE

Attempting to pair stations:
From TMin: 7103611 5160 -12120 994 HORSE LAKE 1983 1994 -999 -999.00
From TMax: 7103611 5160 -12120 994 HORSE LAKE 1983 1994 -999 -999.00
DBG: AUTOPAIRED: 3 3
DBG: pot.auto i,j: 4 4
DBG: i,ncs2m,cs2m(1-5): 4 1 4 8578 8582 8596 0
DBG: paired: 4 4 LONE BUTTE 2

Attempting to pair stations:
From TMin: 7103629 5155 -12120 1145 LONE BUTTE 2 1981 1991 -999 -999.00
From TMax: 7103629 5155 -12120 1145 LONE BUTTE 2 1981 1991 -999 -999.00
DBG: AUTOPAIRED: 4 4
DBG: pot.auto i,j: 5 5
DBG: i,ncs2m,cs2m(1-5): 5 1 5 8578 8582 8596 0
DBG: paired: 5 5 100 MILE HOUSE 6NE

Attempting to pair stations:
From TMin: 7103637 5168 -12122 928 100 MILE HOUSE 6NE 1987 2002 -999 -999.00
From TMax: 7103637 5168 -12122 928 100 MILE HOUSE 6NE 1987 2002 -999 -999.00
DBG: AUTOPAIRED: 5 5
DBG: pot.auto i,j: 6 6
DBG: i,ncs2m,cs2m(1-5): 6 1 6 8578 8582 8596 0
DBG: paired: 6 6 WATCH LAKE NORTH

Attempting to pair stations:
From TMin: 7103660 5147 -12112 1069 WATCH LAKE NORTH 1987 1996 -999 -999.00
From TMax: 7103660 5147 -12112 1069 WATCH LAKE NORTH 1987 1996 -999 -999.00
DBG: AUTOPAIRED: 6 6


Now arguably, the MILE HOUSE ABEL stations should have rolled into one of the other MILE HOUSE ones with
a WMO code.. but the lat/lon/alt aren't close enough. Which is as intended.

*

*

Well, it *kind of* worked. Thought the resultant files aren't exactly what I'd expected:

-rw------- 1 f098 cru 12715138 Jul 25 15:25 act.0707241721.dat
-rw------- 1 f098 cru 435839 Jul 25 15:25 log.0707241721.dat
-rw------- 1 f098 cru 4126850 Jul 25 15:25 mat.0707241721.dat
-rw------- 1 f098 cru 6221390 Jul 25 15:25 tmn.0707021605.dtb.lost
-rw------- 1 f098 cru 2962918 Jul 25 15:25 tmn.0707241721.dat
-rw------- 1 f098 cru 0 Jul 25 15:25 tmx.0702091313.dtb.lost
-rw------- 1 f098 cru 2962918 Jul 25 15:25 tmx.0707241721.dat


act.0707241721.dat: hopefully-complete record of all activities

log.0707241721.dat: hopefully-useful log of odd happenings (and mergeinfo() trails)

mat.0707241721.dat: hopefully-complete list of all merges and pairings

tmn.0707021605.dtb.lost: too-small collection of unpaired stations

tmn.0707241721.dat: too-small output database

tmx.0702091313.dtb.lost: MUCH too-small collection of unpaired stations!!!

tmx.0707241721.dat: too-small (but hey, the same size as the twin) output database

ANALYSIS

Well, LOL, the reason the output databases are so small is that every station looks like this:

9999810 -748 10932 114 SEMPOR INDONESIA 1971 2000 -999 -999.00
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1971 229 225 225 229 229-9999 223 221 222 225 224-9999

Yes - just one line of data. The write loops went from start year to start year. Ho hum :-/

Not as easy to fix as you might think, seeing as the data may well be the result of a merge and
so can't just be pasted in from the source database.

As for the 'unbalanced' 'lost' files: well for a start, the same error as above (just one line of data),
then on top of that, both sets written to the same file. what time did I write that bit, 3am?!! Ecch.


Go on to part 33, back to index or Email search