22. Right, time to stop pussyfooting around the niceties of Tim's labyrinthine software
suites - let's have a go at producing CRU TS 3.0! since failing to do that will be the
definitive failure of the entire project..
Firstly, we need to identify the updated data files. I acquired the following:
iran_asean_GHCN_WWR-CD_save50_CLIMAT_MCDW_updat_merged renamed to pre.0611301502.dat
newbigfile0606.dat renamed to tmp.0611301507.dat
glseries_tmn_final_merged renamed to tmn.0611301516.dat
glseries_tmx_final_merged renamed to tmx.0611301516.dat
anders9106m.dat renamed to tmp9106.0612011708.dat
..and established a directory hierarchy under /cru/cruts/version_3_0
Next step, convert the various db formats to the CRU TS one. Made a visual
comparison which indicated that it would work. Unfortunately it will mean
losing the 'extra' fields that have been tacked onto the headers willy-nilly
as they are undocumented. Furthermore the two extra fields in the CRU TS
format are undocumented, as far as I can see! So I wrote headergetter.for
to produce stats on the CRU TS headers. It looks for violations of the
mandatory blank spaces, and for variations in the two extra fields. Sample
output for temperature and precip:
Header report for tmp.0311051552.dtb
Produced by headgetter.for
Total Records Read: 12155
EXTRA FIELD 1 (72:77)
type detected counted
Missing Value Code 12155
Possible F.P. Value 0
Possible Exp. Value 0
Integer Value Found 0
Real Value Found 0
Unidentifiable 0
EXTRA FIELD 2 (79:86)
type detected counted
Missing Value Code 709
Possible F.P. Value 697
Possible Exp. Value 0
Integer Value Found 10749
Real Value Found 0
Unidentifiable 0
ENDS
Header report for pre.0312031600.dtb
Produced by headgetter.for
Total Records Read: 12732
EXTRA FIELD 1 (72:77)
type detected counted
Missing Value Code 12732
Possible F.P. Value 0
Possible Exp. Value 0
Integer Value Found 0
Real Value Found 0
Unidentifiable 0
EXTRA FIELD 2 (79:86)
type detected counted
Missing Value Code 3635
Possible F.P. Value 437
Possible Exp. Value 0
Integer Value Found 8660
Real Value Found 0
Unidentifiable 0
ENDS
As can be seen, there are no unidentifiable headers - hurrah! - but quite
a few violations of the boundary between the two extra fields, particularly
in the precip database. On examination, the culprits are all African
stations. The two tmp exceptions:
So the first extra field is apparently unused! It would be a handy place for
the 6-character data-code and valid-start-year from the temperature db.
On to a more detailed look at the cru precip format; not sure whether there
are two extra fields or one, and what the sizes are. A quick hack through
the headers is not pleasing. There appears to be only one field, but it can
have up to nine (9) digits in it, and at least three missing value codes:
This is irritating as it means precip has only 9 fields and I can't do a
generic mapping from any cru format to cru ts.
As a glutton for punishment I then looked at the tmin/tmax db format. Looks
like two extra fields (i6,i7) with mvcs of 999999 and 8888888 respectively.
However *sigh* inspection reveals the following two possibilities:
851300 3775 -2568 17PONTA DELGADA PORTUGAL 18652004 9999998888888
851500 3697 -2517 100SANTA MARIA A ACORES 19542006 -77777 8888888
Isn't that marvellous? These can't even be read with a consistent header format!
So, the approach will be to read exactly ONE extra field. For cru tmp that
will be the i2+i4 anders/best-start codes as one. For cru pre it will be
the amazing multipurpose, multilength field. For cru tmnx it will be the
first field, which is at least stable at i6.
Conversions/corrections performed:
Temperature
Converted tmp.0611301507.dat to tmp.0612081033.dat
Found one corrupted station name:
BEFORE
911900 209 1564 20 HI*KAHULUI WSO (PUU NENE) 1954 1990 101954 -999.00
AFTER
911900 209 1564 20 KAHULUI ARPT/MAUI HAWAII 1954 1990 101954 -999.00
Precipitation
Converted pre.0611301502.dat to pre.0612081045.dat
Found one corrupted station name:
BEFORE
4125600 2358 5828 15SEEB AP./=MUSCAT*0.9OMAN 18932006 301965
AFTER
4125600 2358 5828 15 SEEB INTL/MUSCAT OMAN 1893 2006 -999 -999.00
(DL later reported that the name wasintended to signify that the data had been
corrected by a factor of 0.9 when data from another station was incorporated
to extend the series - this was Mike Hulme's work)
Write db2dtb.for, which converts any of the CRU db formats to the CRU TS format.
Started work on mergedb.for, which should merge a primary database with and incoming
database of the same (CRU TS) format. Quite complicated. No operator interventions,
just a log file of failed attempts - but hooks left in for op sections in case this
turns out to be the main programmatic deliverable to BADC!