Well to take a slightly different tack, I thought I'd look at the gridding end of
things. Specifically, how to run IDL in batch mode. I think I've got it: you create
a batch file with the command(s) in, then setenv IDL_STARTUP [name of batch file].
When you type 'idl' it runs the batch file, unfortunately it doesn't quit afterwards,
though adding an 'exit' line to the batch file does the trick! Of course, there is no
easy way to check it's working properly, since the random element (used when relaxing
to the climatology) ensures that each run gives different results:
Still, the mechanism is so similar to that used to run other Fortran progs that we
can carry on, I guess. Naturally I would prefer to use the gridder I wrote, partly
because it does a much better, *documentable* job, but mainly because I don'y want
all that effort wasted!
Also looked at NetCDF production, as it's still looming. ncgen looks quite good, it
can work from a 'CDL' file (format is the same as the output from ncdump). It can
even produce fortran code to reproduce the file!!
Ah well. Back to the 'incoming data' process. The fact that the mcdw2cruauto and
climat2cruauto programs worked fine for CLD is a big bonus, they read their runs
and date files andthey wrote their results. Though the results didn't include the
names of the output databases, I've had second thoughts about that. I want the
update program to be in charge, so it should know what files have been produced
(assuming the result is 'OK'). If the conversion program sends back a list, then
the update program will have to parse it to find out which parameter is which,
and that's silly when it should know anyway!! The situation is different for
merging. I don't have a full strategy for file naming yet. Let's look at a typical
process for an unnamed (not tmn or tmx) primary parameter, ie simple case:
merge mcdw into current
merge climat into current+mcdw
reformat into .dat and .nc
final output files
So, naming. Well the governing principle of the update process is that all files
have the same 10-digit datestamp. So the run can be uniquely identified, as can
all its files (data, log, etc). I am NOT changing that! A main problem is that
we will have to depart from the rigid database naming schema ('tla.datestr.dtb')
because we will have lots of databases in a single run. In the above example,
four databases will all have the same datestamp. Here's a possible name system:
mcdw db mcdw.tla.datestr.dtb
current+mcdw db int1.tla.datestr.dtb
climat db clmt.tla.datestr.dtb
current+mcdw+climat db int2.tla.datestr.dtb
The final db would then be copied or renamed to:
For secondary parameters it's even worse! I'm not super-keen on the use of 'int1'
('interim 1') and so on.. they give no useful information. But a more complicated
schema isn't going to be uderstood by anyone else anyway! And we should have the
Database Master List to refer to at all times.. okay. All interim databases will
be labeled 'int1', 'int2', and so forth. The update program will have to keep
track of numbering. And, of course - it will have to tell the merging program
what to call the output database! Bah.
It gets WORSE. The update program has to know which 'Master' database to pass to
the merge program. For MCDW, it's going to be the 'current' database for that
parameter. But for CLIMAT and BOM, it depends on whether MCDW or CLIMAT
(respectively) merges have gone before. And only for those parameters that are
precursored! More complexity. Well, I suppose I can take one of two approaches:
1. Test at each stage for each parameter (ie for BOM, test whether CLIMAT tmx/tmn
have just been done). This could be done by testing for the filenames or by
2. Maintain a list in memory of 'latest' databases for each parameter. A bit less
elegant, but easier to understand and use.
Well, as we already HAVE (2), we'll go with that one ;0).
Okay. Because it is so complicated (well, for my brain anyway), I'm going to write
out the filenames that update is using and expecting, so I can check that the
conversion and merging programs tie in.
dtstr = 0902161655
par = TMP
source = MCDW
prev db = db/tmp/tmp.0809111204.dtb
runs/runs.0902161655/conv.mcdw.0902161655.dat Run information
updates/MCDW/db/db.0902161655 Dir for output dbs
results/results.0902161655/conv.mcdw.0902161655.res Expected results file
updates/MCDW/db/db.0902161655/mcdw.tmp.0902161655.dtb Expected output db
logs/logs.0902161655/conv.mcdw.0902161655.log Expected log file
db/tmp/tmp.0809111204.dtb Current/latest db
updates/MCDW/db/db.0902161655/mcdw.tmp.0902161655.dtb New db to be merged in
updates/MCDW/db/db.0902161655/int1.tmp.0902161655.dtb Interim output db
runfile.latest.dat Contains name of current run file
runs/runs.0902161655/merg.mcdw.0902161655.dat Run information (read from above)
results/results.0902161655/merg.mcdw.0902161655.res Expected results file
updates/MCDW/db/db.0902161655/int1.tmp.0902161655.dtb Expected output db
logs/logs.0902161655/merg.mcdw.0902161655.log Expected log file
These all seem to match up with the respective programs! Not sure that all
the necessary directories are being created yet, though.. they are now. Some
modifications to the above have been made (and retrospectively updated).
So, with half of the update program written, I got it all compiled, reset all
the incoming data to 'unprocessed', and.. got it working!
Of course, I immediately realised that I'd missed out the DTR conversion at the end.
And that.. didn't go any better than the rest of it, despite a quick conversion of
Well, keen-eyed viewers will remember that all the tmin/tmax/dtr/back-to-tmin-and-tmax
stuff revolves around the tmin and tmax databases being kept in absolute step. That is,
same stations, same coordinates and names, same data spans. Otherwise the job of
synching, and of converting to DTR, becomes horrendous. But look at what happens to the
line counts of the databases as they're mangled through the system:
Sometimes life is just too hard. It's after midnight - again. And I'm doing all this
over VNC in 256 colours, which hurts. Anyway, the above line counts. I don't know
which is the more worrying - the fact that adding the CLIMAT updates lost us 1251
lines from tmax but gained us 1448 for tmin, or that the BOM additions added sod all.
And yes - I've checked, the int2 and int3 databases are IDENTICAL. Aaaarrgghhhhh.
I guess.. I am going to need one of those programs I wrote to sync the tmin and tmax
databases, aren't I?
Actually, it's worse than that. The CLIMAT merges for TMN and TMX look very similar:
<QUOTE CLIMAT TMN MERGE INTO LATEST DB>
New master database: updates/CLIMAT/db/db.0902192248/int2.tmn.0902192248.dtb
Update database stations: 2922
> Matched with Master stations: 2227
(by operator: 0)
> Added as new Master stations: 566
> Rejected: 129
Rejects file: updates/CLIMAT/db/db.0902192248/climat.tmn.0902192248.dtb.rejected
<QUOTE CLIMAT TMX MERGE INTO LATEST DB>
New master database: updates/CLIMAT/db/db.0902192248/int2.tmx.0902192248.dtb
Update database stations: 2921
> Matched with Master stations: 2226
(by operator: 0)
> Added as new Master stations: 566
> Rejected: 129
Rejects file: updates/CLIMAT/db/db.0902192248/climat.tmx.0902192248.dtb.rejected
I don't see how we end up with such drastic differences in line counts!!
Well the first thing to do was to fix climat2cruauto so that it treated tmin and tmax as
inseparable. Thus the CLIMAT databases for these two should be identical (um, apart from
the data values).
OK, this is getting SILLY. Now the BOM and CLIMAT conversions are in sync, and the original
databases are in synch, yet the processing creates massive divergence!!
You see? The HANNOVER 1930 date, and the BERLIN-TEMPELHOF 1991 date, are wrong!! Christ.
That's not even consistent, one's supposedly in the tmin file, the other, the tmax one.
So, an apparently-random pollution of the start dates. And.. FOUND IT! As usual, the program is
doing exactly what I asked it to do. When I wrote it I simply didn't consider the possibility
of tmin and tmax needing to sync. So one of the first things it does, when reading in the
exisitng database, is to truncate station data series where whole years are missing values. And
for HANNOVER, tmax has 1927-1929 missing, but tmin has (some) data in those years. A-ha!
What to do.. I guess the logical thing to do is to not truncate for tmin and tmax! So I added a
flag to newmergedbauto, that it passes to the 'getmos' subroutine, that stops it from replacing
start and end years, and.. it worked!! Hurrah! Or, well.. it ran without giving any errors or
crashing horribly. Yes, that's it. And here are all the 142 files (and directories) it created:
So, this leaves the new databases in the db/xxx/ directories, and db/latest.versions.dat telling
us which ones they are. Which should be all the next suite of programs needs to create the final
output files. Eeeeeeeek.