Caching

Caching and/or mirroring of data happens on two levels

Proxy Cache

All http requests are made through a proxy cache (Squid). Thus there are potentially local copies of all of the THREDDS xml files and DODS .das,.dds, and .dods requests, depending on Squid version 2.5STABLE1 (Sep 2002) supports Vary: headers (allows caching of password-restricted metadata, for example). Squid can also be configured to cache normally-uncachable data, e.g. all dods requests for an hour or so, despite the lack of Last-Modified headers.

Data Cache

For the Taiwan mirror site, we also have an internal data cache. This means that in addition to the readthredds command to get the data collection from the iri, we added a caching instruction which created a local tree of data files. These files are added to as needed, i.e. when a request comes to the server involving data that has not been downloaded yet. Each dataset has a fixed record size, and the software keeps track of which records have been downloaded. When a request comes in, the data needed is checked against the data that is already downloaded, and only the additional data is requested from the server.

Three parts are missing which keeps this from being used in less carefully controlled circumstances.

  1. Modification checking for data that are already downloaded. This requires server/client improvements (to make modification time a function of dataset subset so that a cache can check whether a subset of an extended dataset is still valid).
  2. Data needs to be identified by source (currently the cache for a dataset is identified by its position in the tree).
  3. disk space maintainance, i.e. tracking use/creation of datasets and determining which space can be released.