The HARRY_READ_ME.txt file

Part 22

22. Right, time to stop pussyfooting around the niceties of Tim's labyrinthine software
suites - let's have a go at producing CRU TS 3.0! since failing to do that will be the
definitive failure of the entire project..

Firstly, we need to identify the updated data files. I acquired the following:

iran_asean_GHCN_WWR-CD_save50_CLIMAT_MCDW_updat_merged renamed to pre.0611301502.dat
newbigfile0606.dat renamed to tmp.0611301507.dat
glseries_tmn_final_merged renamed to tmn.0611301516.dat
glseries_tmx_final_merged renamed to tmx.0611301516.dat
anders9106m.dat renamed to tmp9106.0612011708.dat

..and established a directory hierarchy under /cru/cruts/version_3_0

Next step, convert the various db formats to the CRU TS one. Made a visual
comparison which indicated that it would work. Unfortunately it will mean
losing the 'extra' fields that have been tacked onto the headers willy-nilly
as they are undocumented. Furthermore the two extra fields in the CRU TS
format are undocumented, as far as I can see! So I wrote headergetter.for
to produce stats on the CRU TS headers. It looks for violations of the
mandatory blank spaces, and for variations in the two extra fields. Sample
output for temperature and precip:

Header report for tmp.0311051552.dtb
Produced by headgetter.for
Total Records Read: 12155

BLANKS (expected at 8,14,21,26,47,61,66,71,78)
position missed
8 0
14 0
21 0
26 0
47 0
61 0
66 0
71 0
78 2

EXTRA FIELD 1 (72:77)
type detected counted
Missing Value Code 12155
Possible F.P. Value 0
Possible Exp. Value 0
Integer Value Found 0
Real Value Found 0
Unidentifiable 0

EXTRA FIELD 2 (79:86)
type detected counted
Missing Value Code 709
Possible F.P. Value 697
Possible Exp. Value 0
Integer Value Found 10749
Real Value Found 0
Unidentifiable 0

ENDS

Header report for pre.0312031600.dtb
Produced by headgetter.for
Total Records Read: 12732

BLANKS (expected at 8,14,21,26,47,61,66,71,78)
position missed
8 0
14 0
21 0
26 0
47 0
61 0
66 0
71 0
78 154

EXTRA FIELD 1 (72:77)
type detected counted
Missing Value Code 12732
Possible F.P. Value 0
Possible Exp. Value 0
Integer Value Found 0
Real Value Found 0
Unidentifiable 0

EXTRA FIELD 2 (79:86)
type detected counted
Missing Value Code 3635
Possible F.P. Value 437
Possible Exp. Value 0
Integer Value Found 8660
Real Value Found 0
Unidentifiable 0

ENDS

As can be seen, there are no unidentifiable headers - hurrah! - but quite
a few violations of the boundary between the two extra fields, particularly
in the precip database. On examination, the culprits are all African
stations. The two tmp exceptions:

641080 -330 1735 324 BANDUNDU DEM REP CONGO 1961 1990 -99908
642200 -436 1525 445 KINSHASA/BINZA DEM REP CONGO 1960 1990 -99920

And samples of the pre exceptions:

-656002 698 -958 150 SUAKOKO LIBERIA 1951 1970 -999123008050
-655327 727 -723 350 KOUIBLY IVORY COAST 1977 1990 -999109001290
-655001 1320 -235 332 GOURCY BURKINA FASO 1956 1980 -999120001240
-618504 788 -1118 -999 KENEMA/FARM SIERRA LEONE 1951 1972 -999139003500
-612067 1407 -307 253 KORO MALI 1958 1989 -999127002650

So the first extra field is apparently unused! It would be a handy place for
the 6-character data-code and valid-start-year from the temperature db.

On to a more detailed look at the cru precip format; not sure whether there
are two extra fields or one, and what the sizes are. A quick hack through
the headers is not pleasing. There appears to be only one field, but it can
have up to nine (9) digits in it, and at least three missing value codes:

6785300-1863 2700 1080HWANGE/N.P.A. ZIMBABWE 19621996 40
8100100 680 -5820 2GEORGETOWN GUYANA 18462006 -99
6274000 1420 2460 1160KUTUM SUDAN 19291990 194
6109200-9999-99999 -999UNKNOWN NIGER 19891989 -999
6542000 945 -2 197YENDI GHANA 19071997 8010
6544200 672 -160 293KUMASI GHANA 19062006 17009
6122306 1670 -299 267KABARA MALI 19231989 270022
6193128 32 672 -999SAO TOME SAO TOME 19391973 8888888
6266000 1850 3180 249KARIMA SUDAN 19172006 18315801
6109905 1208 -367 315OUARKOYE BURKINA FASO 19601980 120002470

*unimpressed*

This is irritating as it means precip has only 9 fields and I can't do a
generic mapping from any cru format to cru ts.

As a glutton for punishment I then looked at the tmin/tmax db format. Looks
like two extra fields (i6,i7) with mvcs of 999999 and 8888888 respectively.
However *sigh* inspection reveals the following two possibilities:

851300 3775 -2568 17PONTA DELGADA PORTUGAL 18652004 9999998888888
851500 3697 -2517 100SANTA MARIA A ACORES 19542006 -77777 8888888

Isn't that marvellous? These can't even be read with a consistent header format!

So, the approach will be to read exactly ONE extra field. For cru tmp that
will be the i2+i4 anders/best-start codes as one. For cru pre it will be
the amazing multipurpose, multilength field. For cru tmnx it will be the
first field, which is at least stable at i6.


Conversions/corrections performed:

Temperature

Converted tmp.0611301507.dat to tmp.0612081033.dat

Found one corrupted station name:
BEFORE
911900 209 1564 20 HI*KAHULUI WSO (PUU NENE) 1954 1990 101954 -999.00
AFTER
911900 209 1564 20 KAHULUI ARPT/MAUI HAWAII 1954 1990 101954 -999.00

Precipitation

Converted pre.0611301502.dat to pre.0612081045.dat

Found one corrupted station name:
BEFORE
4125600 2358 5828 15SEEB AP./=MUSCAT*0.9OMAN 18932006 301965
AFTER
4125600 2358 5828 15 SEEB INTL/MUSCAT OMAN 1893 2006 -999 -999.00

(DL later reported that the name wasintended to signify that the data had been
corrected by a factor of 0.9 when data from another station was incorporated
to extend the series - this was Mike Hulme's work)

Write db2dtb.for, which converts any of the CRU db formats to the CRU TS format.

Started work on mergedb.for, which should merge a primary database with and incoming
database of the same (CRU TS) format. Quite complicated. No operator interventions,
just a log file of failed attempts - but hooks left in for op sections in case this
turns out to be the main programmatic deliverable to BADC!



Go on to part 23, back to index or Email search