Caching
Caching and/or mirroring of data happens on two levels
Proxy Cache
All http requests are made through a proxy cache (Squid). Thus there
are potentially local copies of all of the THREDDS xml files and DODS .das,.dds,
and .dods requests, depending on
- relative frequency of access
- server labelling (i.e. headers) of responses
- squid configuration/version
Squid version 2.5STABLE1 (Sep 2002) supports Vary: headers (allows
caching of password-restricted metadata, for example). Squid can also
be configured to cache normally-uncachable data, e.g. all dods
requests for an hour or so, despite the lack of Last-Modified headers.
Data Cache
For the Taiwan mirror site, we also have an internal data cache. This
means that in addition to the readthredds command to get the
data collection from the iri, we added a caching instruction which
created a local tree of data files. These files are added to as
needed, i.e. when a request comes to the server involving data that
has not been downloaded yet. Each dataset has a fixed record size,
and the software keeps track of which records have been downloaded.
When a request comes in, the data needed is checked against the data
that is already downloaded, and only the additional data is requested
from the server.
Three parts are missing which keeps this from being used in less
carefully controlled circumstances.
- Modification checking for data that are already
downloaded. This requires server/client improvements (to make
modification time a function of dataset subset so that a cache can
check whether a subset of an extended dataset is still valid).
- Data needs to be identified by source (currently the cache for a
dataset is identified by its position in the tree).
- disk space maintainance, i.e. tracking use/creation of datasets and
determining which space can be released.