For quite a while now, I have been mulling the following issue, without ever seeming to get the final, bestest and greatest solution : how to best share a data structure over many Apache2 requests, in a multi-platform context. My doubts have to do with what exactly can be shared and how between Apache processes, threads and/or requests.

Suppose the following scenario :

On an Apache2/mp2/perl 5.8.x server host, there are 100,000 files, split over 1000 disk directories, at 100 files per directory. Each file in a directory is called file1, file2, .... file100. For a number of reasons, the real locations of the files are encoded as such : - a simple text file on disk contains a list of the 1000 directories, each one identified by some "obscure" key, like this :
  key000=directorypath0
  key001=directorypath1
  key002=directorypath2
  ...
  key999=directorypath999

and the URL provided to web users to access each object is something like :
http://myserver.com/getobj/key500-file7

- the mod_perl module which handles accesses to /getobj/ decodes "key500" into "directorypath500" and retrieves the corresponding object to send it back to the browser.

- the text file on disk (containing the list of directories) occasionally changes : whenever another independent process adds a new file and finds that the current directory has already 100 objects in it, it adds a new key and directory path, and files the new file there. It also updates the text file to reflect the newly added directory. The frequency of these changes is however much much lower than the number of "read" accesses by the browsers.

- This whole Apache2 webserver setup can be running on any platform, so that the particular MPM used cannot really be predicted in advance (it could be threaded, preforking, whatever)

- For portability and ease-of-installation reasons, I would like to avoid the usage of an external DBMS.

What I would like to achieve is that succesive requests to /getdoc/ do not require the mp2 handler module to re-read and reparse the directory text file at each request. To decode the obscure id's into real paths, I would thus like to be able to use something like a simple hashtable in memory. But any "instance" of the module should still be able to notice when the basic authoritative directory file has changed, and reload it when needed before serving a new object. Of course this should not happen more often than necessary.

My basic idea would be to create a perl package encapsulating the decoding of the obscure paths and the updates to the memory hashtable, and use this package in the /getdoc/ handler.

A PerlChildInitHandler would initially read-in the text file and build the internal table, prior to any read handler access. (Alternatively, this could be done the first time a response handler needs to access the structure). The module would contain a method "decode_path(obscure_id)", made available to the response handler, which would take care of checking that the table is still up-to-date, and if not, re-read and re-parse it into the internal hastable. I imagine that each child process could (and probably would) have its own copy of the table, but that each request handler, while processing one request, could have access to that same child-level hashtable.

My doubts focus (mainly) on the following issues
- wether or not I *can* declare and initialise some object e.g. in the PerlChildInitHandler, and later access that same object in the request handlers. - also, if later from the request handler, I would call a method of this object that updates the object content, wether this updated object would still be "shared" by all subsequent instances of request handlers. - supposing that this architecture is running within a threaded environment, are there special guidelines to follow regarding the possibility that 2 threads in the same child would access the object at the same time and try to update the internal table ? - and if I follow such guidelines, does the same code also work if it happens to run in a non-threaded environment ? - if there is a mandatory difference between threaded/non-threaded mp2 perl code, can I check at run-time under which environment I'm running, and condition which code is executed accordingly ?

Thanks for your patience reading this, and thanks in advance for any comments, answers or suggestions.

Reply via email to