The HARRY_READ_ME.txt file

Part 35t

Well for this 'half' of the process it's going to be 90% planning and strategy - because that's
how the first half ended up.

Let's revisit the process list from earlier - just the database-onwards bits and interactivity removed:

* Produce Primary Parameters (TMP, TMN, TMX, DTR, PRE)
anomdtb (per parameter)
quick_interp_tdm2 (per parameter)
glo2abs (per parameter)
makegrids (per parameter)
* Prepare Binary Grids (TMP, DTR, PRE) for Synthetics
quick_interp_tdm2 (per parameter)
* Produce Secondary Parameter (FRS, uses TMP,DTR)
frs_gts_tdm
quick_interp_tdm2
glo2abs
makegrids
* Produce Secondary Parameter (VAP, uses TMP,DTR)
vap_gts_anom
anomdtb
quick_interp_tdm2
glo2abs
makegrids
* Produce Secondary Parameter (WET/RD0, uses PRE)
rd0_gts_anom
anomdtb
quick_interp_tdm2
glo2abs
makegrids
* Produce Secondary Parameter (CLD, uses DTR)
anomdtb (95-02 norm period)
movenorms
dtr2cld
quick_interp_tdm2
glo2abs
makegrids

Having drawn out the process flowchart, I wondered if quick_interp_tdm2.pro would be kind
enough to output both .glo and binary gridded files, simultaneously? This would simplify
and speed things up a bit. So, with absolutely no alarm bells ringing at all, I decided
to make a sample run for DTR, just for 2006, to compare simultaneous outputs with the
original ones. You idiot.

IDL> quick_interp_tdm2,2006,2006,'testdtrglo/dtr.',750,gs=0.5,pts_prefix='dtrtxt/dtr.',dumpglo='dumpglo',dumpbin='dumpbin'
% Compiled module: QUICK_INTERP_TDM2.
% Compiled module: GLIMIT.
Defaults set
2006
% Compiled module: MAP_SET.
% Compiled module: CROSSP.
% Compiled module: STRIP.
% Compiled module: SAVEGLO.
% Compiled module: SELECTMODEL.
no stations found in: dtrtxt/dtr.2006.09.txt
no stations found in: dtrtxt/dtr.2006.10.txt
no stations found in: dtrtxt/dtr.2006.11.txt
no stations found in: dtrtxt/dtr.2006.12.txt
% Compiled module: MEAN.
% Compiled module: MOMENT.
% Compiled module: STDDEV.
grid 2006 non-zero 0.1125 2.1122 2.9219 cells= 202010
% Compiled module: WRBIN.
IDL> exit
crua6[/cru/cruts/version_3_0/primaries/dtr] ls -l testdtrglo/
total 43048
-rw------- 1 f098 cru 6220800 Feb 23 11:06 dtr.2006
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.01.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.02.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.03.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.04.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.05.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.06.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.07.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.08.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.09.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.10.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.11.glo
-rw------- 1 f098 cru 3142986 Feb 23 11:06 dtr.2006.12.glo
crua6[/cru/cruts/version_3_0/primaries/dtr]

So there, as hoped-for, binary and text output files. BUT. Comparisons with earlier
versions from the same database.. are depressingly awful:

crua6[/cru/cruts/version_3_0/primaries/dtr] diff dtr.2006.01.glo testdtrglo/dtr.2006.01.glo |wc -l
33484
crua6[/cru/cruts/version_3_0/primaries/dtr]

Sample comparison of lines 700-710 from old and new glo files:

crua6[/cru/cruts/version_3_0/primaries/dtr] head -710 dtr.2006.01.glo |tail -11
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 3.9705E-04
1.1257E-02 2.2117E-02 3.1641E-02 8.9739E-03 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
crua6[/cru/cruts/version_3_0/primaries/dtr] head -710 testdtrglo/dtr.2006.01.glo | tail -11
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.7088E-03 8.5614E-04
3.4384E-06 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
crua6[/cru/cruts/version_3_0/primaries/dtr]

They're NOTHING LIKE EACH OTHER. I really do hate this whole project. Ran the gridder again, just
for text output.. and..

crua6[/cru/cruts/version_3_0/primaries/dtr] head -710 testdtrglo2/dtr.2006.01.glo | tail -11
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 3.1268E-03
6.5528E-03 9.9787E-03 1.3405E-02 1.6831E-02 9.7796E-03 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
crua6[/cru/cruts/version_3_0/primaries/dtr]

Different again! Can this just be the random seed used in the gridding algorithm? If so, why aren't
we seeing a consistent pattern of 0.0 vs non-0.0 values? Another reason - if one were needed - why
we should dump this gridding approach altogether. But, er, not yet! No time to finish and test the
fortran gridder, which will doubtless sink to some depth and never be seen again, we'll carry on
with this mediocre approach.

Spent a whole day knocking up an anomaly program - as I felt anomdtb was vastly overweight and
supremely complicated to compile. Unfortunately, I got stuck trying to work out data and latlon
factors for different parameters, (argh! why?), and what percentage anomalies really were, and in
the end GAVE UP and now I have to modify anomdtb after all. Actually - that looked even worse, so
went back to anomauto and finished it off. And.. it works. Actually, a bit too well. For example,
when deriving anomalies from the CLD database, this was the original (a few weeks ago!):

uealogin1[/cru/cruts/version_3_0/update_top] wc -l cld.2000.11.txt
606 cld.2000.11.txt

..and this is the new one, from the same source database of course:

uealogin1[/cru/cruts/version_3_0/update_top] wc -l interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
1282 interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt

..so, um - more than twice as many got through! Erk. Screening not tough enough! Results also not
exactly identical (> indicates potential match):

OLD:
uealogin1[/cru/cruts/version_3_0/update_top] head -10 cld.2000.11.txt
> 68.27 22.30 327.0 -21.20000 208000
> 65.83 24.15 6.0 -41.00000 219600
> 63.18 14.50 370.0 -45.30000 222600
> 59.37 13.47 55.0 -43.90000 241800
> 57.78 11.88 53.0 -35.50000 251200
> 57.67 18.35 47.0 -42.10000 259000
69.75 27.03 101.0 -23.80000 280500
67.37 26.65 179.0 -24.30000 283600
64.93 25.37 15.0 -33.40000 287500
62.40 25.68 145.0 -0.80000 293500

NEW:
uealogin1[/cru/cruts/version_3_0/update_top/interim_data/anoms/anoms.0902201545/cld] head -10 cld.2000.11.txt
67.27 14.37 13.0 4.85715 115200
> 68.27 22.30 327.0 8.56250 208000
> 65.83 24.15 6.0 16.59999 219600
> 63.18 14.50 370.0 3.26250 222600
> 59.37 13.47 55.0 15.33749 241800
> 57.78 11.88 53.0 8.93749 251200
> 57.67 18.35 47.0 6.58749 259000
60.13 -1.18 84.0 -7.02500 300500
58.22 -6.32 13.0 -1.22501 302600
57.20 -2.22 65.0 -7.80000 309100

OK, let's look at the means being used. Here's an example:

lat lon alt anom wmo mean
68.27 22.30 327.0 8.56250 208000 90.14

and the actual Nov 2000 value for this station (KARESUANDO, SWEDEN) is 987:

2000 887 800 900-9999-9999 812 762 825 625 825 987-9999

OK. So we read in 987. Then we multiply by the factor, which should be 0.1, giving us 98.7.

Then we subtract the mean, giving us 98.7-90.14 = 8.56, which is what we're getting. So no
mismatches between data, time, and metadata. Good. and the 95/02 mean is right, too (90.1375).

So, er. AH! solved it. Looking at the wrong 'old' cloud text files. tadaa:

OLD BUT CORRECT:
crua6[/cru/cruts/version_3_0/update_top] head -10 ../secondaries/cld/cldupdatetxt/cldupdate.2000.11.txt
68.27 22.30 327.0 8.50000 208000
65.83 24.15 6.0 16.60000 219600
63.18 14.50 370.0 3.20000 222600
59.37 13.47 55.0 15.30000 241800
57.78 11.88 53.0 8.90000 251200
57.67 18.35 47.0 6.50000 259000
69.75 27.03 101.0 8.90000 280500
67.37 26.65 179.0 9.40000 283600
64.93 25.37 15.0 11.20000 287500
62.40 25.68 145.0 5.90000 293500

Hurrah. Now I need to know why I'm producing too many. It's not as bad, though:

OLD BUT CORRECT:
crua6[/cru/cruts/version_3_0/update_top] wc -l ../secondaries/cld/cldupdatetxt/cldupdate.2000.11.txt
760 ../secondaries/cld/cldupdatetxt/cldupdate.2000.11.txt

NEW:
uealogin1[/cru/cruts/version_3_0/update_top] wc -l interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
1282 interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt

Let's look at the first example, a station we let through that anomdtb kicked back:

0115200 6727 1437 13 BODO VI (CIV/MIL) NORWAY 1995 2008 -999 0
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1995-9999-9999-9999-9999-9999-9999-9999-9999-9999 875-9999-9999
1996-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1997-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1998-9999-9999-9999-9999 575 675 762 762 675 837 775-9999
1999 1000 812 762 750 550 750 862 775 637 825 1000-9999
2000 1000 912 800 750 812 850 737 825 700 737 862-9999
2001 875 750 475 650 775 775 825 825 750 900 1000-9999
2002 800 862 750 737 612 612-9999 562 800 462 762-9999
2003 850 825 862 550 712-9999 525 775 762 750 825-9999
2004 937 875 762 525 637 725 787 675 837 750 1000-9999
2005 1000 812 762 700 737 775 687 800 850 850-9999-9999
2006-9999 850 500 612-9999-9999 800 575 812 750 962-9999
2007 1000 712 750 837 762 687 675 812 850 975 950-9999
2008 1000 887 687-9999 750 775 675 612 725 887-9999-9999


Now, our limit for a valid normal is 75%, which for 1995-2002 should mean 6.

BODO VI has five valid values in November. So our limit is either wrong, or not being applied.

..yup:

uealogin1[/cru/cruts/version_3_0/update_top] ./anomauto
minn calculated as 7

Ho hum. Recalculated it to 6 (whilst checking that 1961-1990 still gave 23). Re-ran.

To my horror - if not surprise - that let EVEN MORE IN! Well of course it did you silly sausage.
This still doesn't explain how BODO VI gets in with 5 values:

uealogin1[/cru/cruts/version_3_0/update_top] wc -l interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
1404 interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt

Aha. I wonder if I'm initialising the onestn() array in the wrong place? Because data is
only added if not -9999, so it has to be prefilled with -9999 *every time*.. dammit. If
I fix that, I get:

uealogin1[/cru/cruts/version_3_0/update_top] wc -l interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
746 interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt

14 stations LESS than the previous exercise. That'll do, surely?

OLD RELIABLE:
crua6[/cru/cruts/version_3_0/update_top] head -10 ../secondaries/cld/cldupdatetxt/cldupdate.2000.11.txt
68.27 22.30 327.0 8.50000 208000
65.83 24.15 6.0 16.60000 219600
63.18 14.50 370.0 3.20000 222600
59.37 13.47 55.0 15.30000 241800
57.78 11.88 53.0 8.90000 251200
57.67 18.35 47.0 6.50000 259000
69.75 27.03 101.0 8.90000 280500
67.37 26.65 179.0 9.40000 283600
64.93 25.37 15.0 11.20000 287500
62.40 25.68 145.0 5.90000 293500

NEW LATEST:
uealogin1[/cru/cruts/version_3_0/update_top] head interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
68.27 22.30 327.0 8.56250 208000 90.14
65.83 24.15 6.0 16.59999 219600 83.40
63.18 14.50 370.0 3.26250 222600 85.44
59.37 13.47 55.0 15.33749 241800 84.66
57.78 11.88 53.0 8.95714 251200 86.04
57.67 18.35 47.0 6.58749 259000 85.91
69.75 27.03 101.0 8.95714 280500 91.04
67.37 26.65 179.0 9.47143 283600 90.53
64.93 25.37 15.0 11.27142 287500 88.73
62.40 25.68 145.0 5.90000 293500 94.10

It's not going to be easy to find 14 missing stations, is it? Since the anomalies aren't exactly the same.

Should I be worried about 14 lost series? Less than 2%. Actually, I noticed something interesting.. look
at the anomalies. The anomdtb ones aren't *rounded* to 1dp, they're *truncated*! So, er - wrong!

So let's say, anomalies are done. Hurrah. Onwards, plenty more to do!

Got the gridding working, I think. IDL of course. I modified quick_interp_tdm2.pro to accept
start and end months, otherwise it just produces whole years with files of zeros for months with
no anomaly file. And errors. And since this is likely to be a six-month update..

Re-planned the program layout. Not a major exercise, just putting different loops in to speed up and
simplify operations. It now runs as follows (note this is simplified!!):

1. User chooses update databases or update datasets. Dates, parameters, etc.

2. Update Databases
2.1 Convert any MCDW bulletins to CRU format; merge into existing databases
2.2 Convert any CLIMAT bulletins to CRU format; merge into databases from 2.1
2.3 Convert any BOM bulletins to CRU format; merge into databases from 2.2

3. Update datasets
3.1 Convert databases to anomalies
3.2 Grid primary parameters
3.3 Generate synthetic secondary parameters
3.4 Grid secondary parameters
3.5 Convert gridded anomalies to actuals
3.6 Produce final datasets

1876 lines including subrotuines and notes. Ten Fortran and four IDL programs (plus indirect ones). All
Fortran programs are mine, now. Top-level listing:

drwx------ 10 f098 cru 4096 Feb 19 20:55 db
drwx------ 3 f098 cru 4096 Feb 28 17:01 reference
drwx------ 3 f098 cru 4096 Mar 1 15:41 runs
drwx------ 4 f098 cru 4096 Feb 23 12:15 gridded_finals
drwx------ 4 f098 cru 4096 Feb 27 17:56 results
drwx------ 5 f098 cru 4096 Mar 1 15:40 logs
drwx------ 6 f098 cru 4096 Dec 18 11:00 updates
drwx------ 8 f098 cru 4096 Feb 28 16:15 interim_data
-rw------- 1 f098 cru 11 Feb 27 17:48 newdata.latest.date
-rwxr-xr-x 1 f098 cru 132425 Mar 1 14:41 update
-rwxr-xr-x 1 f098 cru 16465 Mar 1 14:41 dtr2cldauto
-rwxr-xr-x 1 f098 cru 17990 Mar 1 14:55 tmnx2dtrauto
-rwxr-xr-x 1 f098 cru 19427 Mar 1 15:43 glo2absauto
-rwxr-xr-x 1 f098 cru 20929 Mar 1 14:42 movenormsuato
-rwxr-xr-x 1 f098 cru 23350 Mar 1 15:42 anomauto
-rwxr-xr-x 1 f098 cru 29076 Mar 1 14:50 climat2cruauto
-rwxr-xr-x 1 f098 cru 29481 Mar 1 14:50 bom2cruauto
-rwxr-xr-x 1 f098 cru 29867 Mar 1 14:49 mcdw2cruauto
-rwxr-xr-x 1 f098 cru 323870 Mar 1 15:52 makegridsauto
-rwxr-xr-x 1 f098 cru 89515 Mar 1 16:10 newmergedbauto

So, to station counts. These will have to mirror section 3 above. Coverage of secondary parameters is
particularly difficult - what is the best approach? To include synthetic coverage, when it's only at
2.5-degree?

No. I'm going to back my previous decision - all station count files reflect actualy obs for that
parameter only. So for secondaries, you get actual obs of that parameter (ie naff all for FRS). You
get the info about synthetics that enables you to use the relevant primary counts if you want to. Of
course, I'm going to have to provide a combined TMP and DTR station count to satisfy VAP & FRS users.
The problem is that the synthetics are incorporated at 2.5-degrees, NO IDEA why, so saying they affect
particular 0.5-degree cells is harder than it should be. So we'll just gloss over that entirely ;0)

ARGH. Just went back to check on synthetic production. Apparently - I have no memory of this at all -
we're not doing observed rain days! It's all synthetic from 1990 onwards. So I'm going to need
conditionals in the update program to handle that. And separate gridding before 1989. And what TF
happens to station counts?

OH FUCK THIS. It's Sunday evening, I've worked all weekend, and just when I thought it was done I'm
hitting yet another problem that's based on the hopeless state of our databases. There is no uniform
data integrity, it's just a catalogue of issues that continues to grow as they're found.

rd0_gts_anom_05 will produce half-degree .glo files from gridded pre anoms. So if we call that, we
can use it, and stncounts for PRE will be authentic (as it's the sole input). Final decision: coded
update.for to produce WET from obs+syn until 12/1989, syn only thereafter. WET station counts only
produced until 1989, PRE must be used (with caveats) after that point.

Wrote tmpdtrstnsauto.for to produce tmp.and.dtr station counts (ie you only get a count when both
parameters have a count, and even then it's the min()). The resulting counts are the effective FRS
counts, and the synthetic VAP counts.

Onto PET. Tracked down the PET program from Dimitrios, way back in 2007! It uses TMP, TMN, TMX, VAP,
CLD and WND (the latter as 61-90 normals from IPCC). Converted to f77 'automatic' (makepetauto.for).


Go on to part 35u, back to index or Email search