FOIA HARRY_READ

23. Interrupted work on mergedb.for in order to trial a precip gridding for 3.0. This
required another new proglet, addnormline.for, which adds a normals line below each
header. It fills in the normals values if the condisions are met (75% of values, or
23 for the 30 year period).

Initial results promising.. ran it for precip, it added normals lines OK, a total of
15942 with 6003 missing value lines. No errors, and no ops interventions because the
file didn't have normals lines before!

'Final' precip file: pre.0612151458.dtb

Tried running anomdtb.f90.. failed because it couldn't find the .dts file! No matter
that it doesn't need it - argh!

Examined existing .dts files.. not sure what they're for. Headers are identical to
the .dtb file, all missing values are retained, all other values are replaced with
one of several code numbers, no idea what they mean.

Wrote 'falsedts.for' to produce dummy .dts files with all zeros in place of real
data values. Produced pre.0612151458.dts.

Added normals line, producing: pre.0612181221.dtb
Re-produced matching pre.0612181221.dts file.

Tried running anomdtb.f90 again. This time it crashed at record #1096. Wrote a proglet
'findstn.for' to find the n-th station in a dtb file, pulled out 1096:

0 486 10080 1036 BUKIT LARUT MALAYSIA 1951 1988 -999 -999.00
6190 2094 2015 2874 3800 4619 3032 5604 3718 4626 5820 5035 3049
1951 3330 2530 2790 5660 4420 4030 1700 2640 8000 5950 6250 2020

(snipped normal years)

1979 110 1920 1150 5490 3140 308067100 2500 4860 4280 4960 1600

Uh-oh! That's 6.7m of rain in July 1979? Looks like a factor-of-10 problem. Confirmed
with DL and changed to 6710.

Next run, crashed at #4391, CHERRAPUNJI, the wettest place in the world. So here, the
high values are realistic. However I did notice that the missing value code was -10
instead of -9999! So modified db2dtb.for to fix that and re-produced the precip database
as pre.0612181214.dat. This then had to have normals recalculated for it (after fixing
#1096).

Finally got it through anomdtb.for AND quick_interp_tdm2 - without crashing! IDL was even
on the ball with the missing months at the end of 2006:

IDL> quick_interp_tdm2,1901,2006,'preglo/pregrid.',450,gs=0.5,dumpglo='dumpglo',pts_prefix='preanoms/pre.'
% Compiled module: QUICK_INTERP_TDM2.
% Compiled module: GLIMIT.
Defaults set
1901
% Compiled module: MAP_SET.
% Compiled module: CROSSP.
% Compiled module: STRIP.
% Compiled module: SAVEGLO.
% Compiled module: SELECTMODEL.
1902
1903
(etc)
2005
2006
no stations found in: preanoms/pre.2006.09.txt
no stations found in: preanoms/pre.2006.10.txt
no stations found in: preanoms/pre.2006.11.txt
no stations found in: preanoms/pre.2006.12.txt

All good. Wrote mergegrids.for to create the more-familiar decadal and full-series
files from the monthly *.glo.abs ones.

Then.. like an idiot.. I had to test the data! Duh.

Firstly, wrote mmeangrid.for and cmpmgrids.m to get a visual comparison of old and
new precip grids (old being CRU TS 2.10). This showed variations in 'expected' areas
where changes had been made, it the Southern tip of Greenland.

Next, Phil requested some statistical plots of percentage change in annual totals,
and long-term trends. Wrote 'anntots.for' to convert monthly gridded files into
yearly totals files. Then tried to write precipchecker.m to do the rest in Matlab..
it wasn't having it, OUT OF MEMORY! Bah. So wrote 'prestats.for' to calculate the
final stats, for printing with an emasculated precipchecker.m. BUT.. it wouldn't
work, and on investigating I found 200-odd stations with zero precipitation for
the entire 1901-2006 period! Modified anntots.for to dump a single grid with those
cells that remained at zero marked, then plotted.

Zero cells in North Africa and the Western coast of South America. None in the
CRU TS 2.10 precip grids :-(

Next step, produce a list of cell centres of the offending cells. wrote a quick
proglet, 'idzerocells.for'. Then 'getcellstations.for', which, given a CRUTS DB
file and a list of lat/lon values, extracts all stations lying inside the cells
listed.

Uh-oh. Looked in the new pre db and found 15 stations for 257 zero cells! They are:

6061170 2810 670 381 FT FLATTERS ALGERIA 1925 1965 -999 -999.00
6064000 2650 840 559 FORT POLIGNAC ALGERIA 1925 2006 -999 -999.00
6262000 2080 3260 470 STATION NO. 6 SUDAN 1950 1988 -999 -999.00
8450100 -810 -7900 26 TRUJILLO PERU 1961 2006 -999 -999.00
8453100 -920 -7850 10 CHIMBOTE PERU 1961 2006 -999 -999.00
8462800 -1200 -7710 13 LIMA-CALLAO/INTL.AP. PERU 1961 2006 -999 -999.00
8463100 -1210 -7700 137 LIMATAMBO/C.DE MARTE PERU 1927 1980 -999 -999.00
8469100 -1380 -7630 6 PISCO PERU 1942 2006 -999 -999.00
8540600 -1850 -7030 29 ARICA/CHACALLUTA CHILE 1903 2006 -999 -999.00
8541700 -2020 -7020 6 IQUIQUE/CAVANCHA CHILE 1886 1986 -999 -999.00
8541800 -2053 -7018 52 IQUIQUE DIEGO ARACEN CHILE 1989 2006 -999 -999.00
8700494 -707 -7957 150 CAYALTI PERU 1934 1959 -999 -999.00
8700562 -1203 -7703 137 LIMA PERU 1929 1963 -999 -999.00
8700581 -1207 -7717 13 LA PUNTA (NA PERU 1939 1963 -999 -999.00
9932040 2810 670 381 FT FLATTER ALGERIA 1925 1965 -999 -999.00

Looked for the same zero cell stations in the old pre db (pre.0312031600.dtb) and only
found 10:

-854031 -2021 -7015 5 IQUIQUE/CAVANCHA CHILE 1899 1986 -999 0.00
-843002 -1210 -7700 135 LIMATAMBO PERU 1927 1980 -999
-603550 2810 670 381 FT FLATTER ALGERIA 1925 1965 -999 -999.00
606400 2650 841 558 ILLIZI/ILLIRANE ALGERIA 1925 2002 -999 -999
626200 2075 3255 468 STATION NO. 6 SUDAN 1950 1988 -999 -999.00
845010 -810 -7903 30 TRUJILLO/MARTINEZ PERU 1961 2002 -999 -999
845310 -916 -7851 11 CHIMBOTE/TENIENTE PERU 1961 2001 -999
846280 -1200 -7711 13 LIMA/JORGE CHAVEZ PERU 1961 2002 -999 -999
846910 -1375 -7628 7 PISCO (CIV/MIL) PERU 1942 2002 -999 -999
854180 -2053 -7018 52 IQUIQUE/DIEGO ARAC CHILE 1989 2002 -999 -999.00

So why does the old db result in no 'zero' cells, and the new db give us over 250? I
wondered if normals might be the answer, but none of the 10 stations from the old db
have in-db normals, wheras three of the new db have:

8453100 -920 -7850 10 CHIMBOTE PERU 1961 2006 -999 -999.00
6190 19 59 36 18 5 0 3 0 0 1 10 5
8469100 -1380 -7630 6 PISCO PERU 1942 2006 -999 -999.00
6190 3 0 3 0 0 1 1 3 1 4 0 0
8540600 -1850 -7030 29 ARICA/CHACALLUTA CHILE 1903 2006 -999 -999.00
6190 1 3 0 0 0 2 2 2 2 0 0 0

So these alone ought to guarantee three of the cells being nonzero - they should have
the bloody normals in! So the next check has to be the climatology, that which provides
the cell-by-cell normals..

A check of the gridded climatology revealed that all 257 'zero cells' have their
climatologies set to zero, too. This was partially checked in the GRIM-format climatology
just in case!

Next, a focus: on CHIMBOTE (see header line above). This has real data (not just zeros).
It is in cell (162,203), or (-9.25,-78.75) [lat, lon in both cases]. So we extract the
full timeseries for that cell from the published 2.10 (1901-2002) GRIM file:

Grid-ref= 203, 162
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 0 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 7 0 3
2 0 0 0 2 0 0 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 0 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 0 0 0 5 0 3
0 0 0 0 2 0 0 0 0 0 0 0
0 0 0 0 2 0 0 0 0 0 0 2
0 0 0 0 2 5 6 0 0 0 0 3
2 3 0 0 0 0 17 0 0 4 0 3
2 0 0 0 3 0 2 0 0 2 0 3
0 0 0 0 0 0 14 0 0 9 0 0
0 0 0 0 0 0 0 0 0 2 0 2
0 0 0 0 0 0 12 0 0 0 0 5
0 0 0 0 0 0 0 0 0 3 0 2
0 0 0 0 0 0 10 0 0 0 0 2
0 0 0 0 3 0 11 0 0 2 0 3
0 0 0 0 2 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 4 0 0
3 0 0 0 0 0 15 0 0 0 0 2
0 0 0 0 0 0 0 0 0 4 3 2
5 0 0 0 0 0 0 0 0 12 0 3
0 0 2 2 4 2 0 0 2 3 0 3
0 0 0 0 3 0 0 2 0 2 2 3
0 0 0 0 0 0 0 3 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 7 0 3 0 0 0 0 0 0
0 0 2 3 0 0 0 4 0 0 12 0
0 0 9 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 6 2 0 0 0 6 0 0 0 0 0
0 0 0 0 0 0 0 2 2 0 0 0
0 0 0 0 0 3 0 0 0 0 0 0
0 0 0 0 2 2 0 0 0 0 0 0
0 2 0 0 0 0 0 2 0 0 0 0
0 0 0 7 0 0 0 2 0 0 0 3
2 0 7 0 0 2 0 0 2 0 0 0
0 0 0 7 0 0 2 2 2 0 0 0
8 0 2 0 0 0 0 2 0 0 0 0
0 7 0 0 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 2 0 0
0 0 0 0 2 0 0 0 0 0 0 0
2 0 0 0 0 2 0 0 0 0 10 0
3 0 0 0 0 0 9 0 0 0 0 3
0 0 0 0 0 0 0 0 0 3 0 5
4 0 0 2 10 2 0 0 0 0 0 4
0 0 0 0 0 0 0 0 2 5 0 0
0 0 0 0 0 0 9 0 0 0 0 0
0 0 0 0 0 0 0 3 0 0 0 0
0 0 0 0 0 0 0 0 0 5 0 0
3 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 3 0 0 0
0 0 0 0 8 2 0 0 0 0 0 3
0 0 2 0 2 0 0 0 0 0 2 3
0 0 0 0 2 0 2 0 0 5 0 2
0 0 0 0 2 0 2 0 0 0 0 0
0 0 0 0 0 0 0 0 0 5 0 0
0 0 0 0 2 0 0 2 0 0 0 0
2 0 2 0 0 0 0 0 0 5 0 0
0 0 0 0 11 0 2 0 0 4 0 3
2 3 2 0 13 0 0 0 0 0 0 0
2 6 0 3 0 0 0 0 2 3 0 7
2 0 0 0 2 0 0 0 0 0 0 3
0 0 0 0 0 0 2 0 0 0 0 2
0 0 0 0 0 0 0 0 0 0 0 3

..yet in the 3.00 version, it's all zeros!

Only one thing for it.. examine the attempt at regenerating 2.10.
Unfortunately - well, interestingly then - this gave the same
zero cells as the 3.00 generation! So it's something to do with
the process, not the database (or the climatology, assuming that
has remained constant, which I gather it has).

Update: aha! Phil pointed out that for precip the climatology
is used as a MULTIPLIER. So if the clim hasn't changed, the
cells should always have been zero regardless of actual data.

As I should have remembered:

crua6[/cru/cruts/version_3_0/primaries/precip] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.pre
Enter a name for the gridded climatology file: clim.6190.lan.pre.grid
Enter the path and stem of the .glo files: preglo/pregrid.
Enter the starting year: 1901
Enter the ending year: 2006
Enter the path (if any) for the output files: pregrid/
Now, CONCENTRATE. Addition or Percentage (A/P)? P
Right, erm.. off I jolly well go!
pregrid.01.1901.glo
pregrid.02.1901.glo
(etc)

Decided to read Mitchell & Jones 2005 again. Noticed that the
limit for SD when anomalising should be 4 for precip, not 3! So
re-ran with that:

crua6[/cru/cruts/version_3_0/primaries/precip] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.pre
> Will calculate percentage anomalies.
> Select the .cts or .dtb file to load:
pre.0612181221.dtb
pre.0612181221.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dtb

> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal: 23
> Specify the no. of stdevs at which to reject data:
4
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the generic .txt file to save (yy.mm=auto):
pre4sd.txt
> Select the first,last years AD to save:
1901,2006
> Operating...
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dts

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dts

> NORMALS MEAN percent STDEV percent
> .dtb 7315040 73.8
made it to here
> .cts 299359 3.0 7613600 76.8
> PROCESS DECISION percent %of-chk
> no lat/lon 17527 0.2 0.2
> no normal 2355659 23.8 23.8
> out-of-range 13253 0.1 0.2
> duplicated 586206 5.9 7.8
> accepted 6934807 70.0
> Dumping years 1901-2006 to .txt files...

This is not as good a percentage as for 2.10:

> NORMALS MEAN percent STDEV percent
> .dtb 0 0.0
> .cts 3375441 84.1 3375441 84.1
> PROCESS DECISION percent %of-chk
> no lat/lon 3088 0.1 0.1
> no normal 638538 15.9 15.9
> out-of-range 70225 1.7 2.1
> duplicated 135457 3.4 4.1
> accepted 3167636 78.9
> Dumping years 1901-2002 to .txt files...

But the actual number of accepted values is more than TWICE 2.10!

Of course, the same 257 gridcells are zeros, because the multiplicative
normals are still zero.

For reference, these are the results for the 3 SD limit of 3.00:

> NORMALS MEAN percent STDEV percent
> .dtb 7315040 73.8
made it to here
> .cts 284160 2.9 7598401 76.7
> PROCESS DECISION percent %of-chk
> no lat/lon 17527 0.2 0.2
> no normal 2370858 23.9 24.0
> out-of-range 32379 0.3 0.4
> duplicated 583193 5.9 7.8
> accepted 6903495 69.7
> Dumping years 1901-2006 to .txt files...

So we've only gained 0.3% of values, a real figure of 31312 values.

Conclusion: stick with a 3 Standard Deviation limit, like the
Read_Me says.

The HARRY_READ_ME.txt file

Part 23