27. Well, enough excuses - time to remember how to do the anomalising and
gridding things! Fisrtly, ran 'addnormline' just to ensure all normals are
up to date. The result was 8 new sets of normals, so well worth doing. The
database is now:
tmp.0704292158.dtb
Ran 'anomdtb' - got caught out by the requirement for a companion '.dts'
file again, ran 'falsedts.for' and carried on.. would still be nice to be
sure that it's not something meaningful **sigh**.
> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal: 23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the generic .txt file to save (yy.mm=auto):
tmp.txt
> Select the first,last years AD to save:
1901,2006
> Operating...
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb
In fact, the first two columns never get outside of +/- 30. Oh bugger.
What the HELL is going on?!
Decided to pursue that worrying (and impossible) 'duplicates' figure.
The function 'sort' was used to sort the database so that any duplicate
lines would be together - then 'uniq' was used to pull out duplicates.
There were quite a few dupes, and one or two triples too, like these:
These are from the following stations:
720344 408 1158 1539 ELKO-FAA-AP---------USA--------- 1870 1996 301870 -999.00
725837 408 1158 1549 NV ELKO FAA AP 1930 1990 101930 -999.00
725910 401 1223 103 RED BLUFF USA 1878 2006 101878 -999.00
The past two are consecutive stations.
Looking at the last two.. it seems that 725910 has 725837's data!
Ascan be seen, 1981 sees a complete chance in range, especially for
Autumn/Winter. In fact, from 1981 to 1990, 725910 is a copy of
725837! It then reverts to the original range for the rest of the run.
So.. did the merging program do this? Unfortunately, yes. Check dates:
crua6[/cru/cruts/version_3_0/db/testmergedb] grep -n 'RED BLUFF' tmp.0*.*
tmp.0612081519.dat:28595: 725910 401 1223 103 RED BLUFF USA 1991 2006 101991 -999.00
tmp.0702091122.dtb:171674: 725910 401 1223 103 RED BLUFF USA 1878 1980 101878 -999.00
tmp.0704251819.dtb:200331: 725910 401 1223 103 RED BLUFF USA 1878 2006 101878 -999.00
tmp.0704271015.dtb:254272: 725910 401 1223 103 RED BLUFF USA 1878 2006 101878 -999.00
tmp.0704292158.dtb:254272: 725910 401 1223 103 RED BLUFF USA 1878 2006 101878 -999.00
crua6[/cru/cruts/version_3_0/db/testmergedb]
The first file is the 1991-2006 update file. The second is the original
temperature database - note that the station ends in 1980.
It has *inherited* data from the previous station, where it had -9999
before! I thought I'd fixed that?!!!
/goes off muttering to fix mergedb.for for the five hundredth time
Miraculously, despite being dog-tired at nearly midnight on a Sunday, I
did find the problem. I was clearing the data array but not close enough
to the action - when stations were being passed through (ie no data to
add to them) they were not being cleaned off the array afterwards. Meh.
Wrote a specific routine to clear halves of the data array, and back to
square one. Re-ran the ACT file to merge the x-1990 and 1991-2006 files.
Created an output file exactly the same size as the last time (phew!)
but with..
SIMPLYADDNEW - add stations to a database
This program assumes the two databases have
NO COMMON STATIONS and will fail (stop) if
any are found.
Please enter the main database: tmp.0704292355.dtb
Please enter the new database: tmp.0704251654.dat
Please enter a 3-character parameter code: tmp
Output database is: tmp.0704300053.dtb
crua6[/cru/cruts/version_3_0/db/testmergedb]
So now we have the combined database again, a bit quicker than
last time: tmp.0704300053.dtb. Pity we slid into May: I was hoping
to only be FIVE MONTHS late.
What's worse - there are STILL duplicate non-missing lines, 210 of
them. The first example is this:
These two stations obviously have a lot in common - though not
everything, as their normals (shown) differ. In fact, on examination
the US database record is a poor copy of the main database one, it
has more missing data and so forth. By 1870 they have diverged, so
in this case it's probably OK.. but what about the others? I just do
not have the time to follow up everything. We'll have to take 210
year repetitions as 'one of those things'.
..actually, I decided in the end to follow up all 210 of them. The
likelihood is that the number is far greater, since the filtering
that gave the 210 figure excluded any lines with two or more
consecutive missing values (to avoid hundreds of just-missing-value
lines). Also I spotted some instances where data lines would be
identical but for one or more missing values in one of the stations.
After checking, I found that the majority of the duplications were
between the original database and the US database, with just a couple
of 'linked' stations within the original database, and half a dozen
in the 1991-2006 update file. One surprise was that stations I'm sure
I rejected ended up marked as 'addnew' in the .act file - quite
unsettling!
Rather foolishly, perhaps, I decided to have a go at interactively
incorporating the US data rather than using 'simplyaddnew'. However,
progress was so slow (because of the high number of 'near matches')
that this approach was abandoned.
Tried 'anomdtb' with the fixed final file (tmp.0704300053.dtb)...
no better! The crucial bits:
> NORMALS MEAN percent STDEV percent
> .dtb 3323823 81.3
made it to here
> .cts 91963 2.2 3415786 83.5
> PROCESS DECISION percent %of-chk
> no lat/lon 0 0.0 0.0
> no normal 675037 16.5 16.5
> out-of-range 744 0.0 0.0
> duplicated 4100117 100.2 120.1
> accepted -685075 -16.7
> Dumping years 1901-2006 to .txt files...
> Failed to create file. Try again.
> Enter the file, with suffix: .ann
tmp.ann
> Failed to create file. Try again.
> Enter the file, with suffix: .ann
h.ann
crua6[/cru/cruts/version_3_0/primaries/temp]
So the 'duplicated' figure is slightly lower.. but what's this
error with the '.ann' file?! Never seen before. Oh GOD if I
could start this project again and actually argue the case for
junking the inherited program suite!!
OK.. the .ann file was simply that it refuses to overwrite any
existing one. Meh. It's happy to overwrite the log file of
course - nice bit of logic there.
and the duplicates? Well I inserted a debug line where the
decision is made. Here's an example:
712600 vs. 727340: 4.7 8.4 4.7 8.4 -> 0.0km
Here the two WMO codes look OK (though others are -999 which
seems unlikely) but the two lat/lon pairs? Ooops. Here are the
actual headers:
So, uhhhh.. what in tarnation is going on? Just how off-beam
are these datasets?!!
Not sure why the lats & lons are a factor of 10 too low - may
be intentional though it wasn't happening before.
Ran with the original database:
> NORMALS MEAN percent STDEV percent
> .dtb 2113609 81.7
made it to here
> .cts 0 0.0 2113608 81.7
> PROCESS DECISION percent %of-chk
> no lat/lon 0 0.0 0.0
> no normal 474422 18.3 18.3
> out-of-range 68179 2.6 3.2
> duplicated 923258 35.7 45.1
> accepted 1122172 43.4
> Dumping years 1901-1990 to .txt files...
The lats & lons look the same.. but a lot less duplicates!
WHY? Well, it could just be those pesky US stations.. so
why not compare the two bespoke log files (as excerpted above)?
Immediately, another baffler: the log file from the run of
the 'final' database has lots of 'DEBUG DETAIL' information,
but the log file from the run of the original database does not!
So cropping those away with a judicious 'tail'.. I ran comm:
crua6[/cru/cruts/version_3_0/primaries/temp] comm -23 log_anomdtb_H.0702091122.dat barelog_anomdtb_H.0704300053.dat |wc -l
200
crua6[/cru/cruts/version_3_0/primaries/temp] comm -13 log_anomdtb_H.0702091122.dat barelog_anomdtb_H.0704300053.dat | wc -l
2572
crua6[/cru/cruts/version_3_0/primaries/temp] comm -12 log_anomdtb_H.0702091122.dat barelog_anomdtb_H.0704300053.dat | wc -l
1809
So 200 duplication events are unique to the older database,
and 2572 are unique to the new database - with 1809 common
to both. A quick look at the 2572 'new' ones showed a majority
of those with the first WMO as -999: this is the key. The
databases do not have any records with WMO=-999 as far as I know,
so something is going on..