For quite a while now, I have been mulling the following issue, without
ever seeming to get the final, bestest and greatest solution : how to
best share a data structure over many Apache2 requests, in a
multi-platform context.
My doubts have to do with what exactly can be shared and how between
Apache processes, threads and/or requests.
Suppose the following scenario :
On an Apache2/mp2/perl 5.8.x server host, there are 100,000 files, split
over 1000 disk directories, at 100 files per directory. Each file in a
directory is called file1, file2, .... file100.
For a number of reasons, the real locations of the files are encoded as
such :
- a simple text file on disk contains a list of the 1000 directories,
each one identified by some "obscure" key, like this :
key000=directorypath0
key001=directorypath1
key002=directorypath2
...
key999=directorypath999
and the URL provided to web users to access each object is something like :
http://myserver.com/getobj/key500-file7
- the mod_perl module which handles accesses to /getobj/ decodes
"key500" into "directorypath500" and retrieves the corresponding object
to send it back to the browser.
- the text file on disk (containing the list of directories)
occasionally changes : whenever another independent process adds a new
file and finds that the current directory has already 100 objects in
it, it adds a new key and directory path, and files the new file there.
It also updates the text file to reflect the newly added directory.
The frequency of these changes is however much much lower than the
number of "read" accesses by the browsers.
- This whole Apache2 webserver setup can be running on any platform, so
that the particular MPM used cannot really be predicted in advance (it
could be threaded, preforking, whatever)
- For portability and ease-of-installation reasons, I would like to
avoid the usage of an external DBMS.
What I would like to achieve is that succesive requests to /getdoc/ do
not require the mp2 handler module to re-read and reparse the directory
text file at each request. To decode the obscure id's into real paths,
I would thus like to be able to use something like a simple hashtable in
memory.
But any "instance" of the module should still be able to notice when the
basic authoritative directory file has changed, and reload it when
needed before serving a new object. Of course this should not happen
more often than necessary.
My basic idea would be to create a perl package encapsulating the
decoding of the obscure paths and the updates to the memory hashtable,
and use this package in the /getdoc/ handler.
A PerlChildInitHandler would initially read-in the text file and build
the internal table, prior to any read handler access. (Alternatively,
this could be done the first time a response handler needs to access the
structure).
The module would contain a method "decode_path(obscure_id)", made
available to the response handler, which would take care of checking
that the table is still up-to-date, and if not, re-read and re-parse it
into the internal hastable.
I imagine that each child process could (and probably would) have its
own copy of the table, but that each request handler, while processing
one request, could have access to that same child-level hashtable.
My doubts focus (mainly) on the following issues
- wether or not I *can* declare and initialise some object e.g. in the
PerlChildInitHandler, and later access that same object in the request
handlers.
- also, if later from the request handler, I would call a method of this
object that updates the object content, wether this updated object would
still be "shared" by all subsequent instances of request handlers.
- supposing that this architecture is running within a threaded
environment, are there special guidelines to follow regarding the
possibility that 2 threads in the same child would access the object at
the same time and try to update the internal table ?
- and if I follow such guidelines, does the same code also work if it
happens to run in a non-threaded environment ?
- if there is a mandatory difference between threaded/non-threaded mp2
perl code, can I check at run-time under which environment I'm running,
and condition which code is executed accordingly ?
Thanks for your patience reading this, and thanks in advance for any
comments, answers or suggestions.