Sharing data between many requests

André Warnier Wed, 26 Sep 2007 04:34:21 -0700

For quite a while now, I have been mulling the following issue, withoutever seeming to get the final, bestest and greatest solution : how tobest share a data structure over many Apache2 requests, in amulti-platform context.My doubts have to do with what exactly can be shared and how betweenApache processes, threads and/or requests.


Suppose the following scenario :

On an Apache2/mp2/perl 5.8.x server host, there are 100,000 files, splitover 1000 disk directories, at 100 files per directory. Each file in adirectory is called file1, file2, .... file100.For a number of reasons, the real locations of the files are encoded assuch :- a simple text file on disk contains a list of the 1000 directories,each one identified by some "obscure" key, like this :

  key000=directorypath0
  key001=directorypath1
  key002=directorypath2
  ...
  key999=directorypath999

and the URL provided to web users to access each object is something like :
http://myserver.com/getobj/key500-file7

- the mod_perl module which handles accesses to /getobj/ decodes"key500" into "directorypath500" and retrieves the corresponding objectto send it back to the browser.

- the text file on disk (containing the list of directories)occasionally changes : whenever another independent process adds a newfile and finds that the current directory has already 100 objects init, it adds a new key and directory path, and files the new file there.It also updates the text file to reflect the newly added directory.The frequency of these changes is however much much lower than thenumber of "read" accesses by the browsers.

- This whole Apache2 webserver setup can be running on any platform, sothat the particular MPM used cannot really be predicted in advance (itcould be threaded, preforking, whatever)

- For portability and ease-of-installation reasons, I would like toavoid the usage of an external DBMS.

What I would like to achieve is that succesive requests to /getdoc/ donot require the mp2 handler module to re-read and reparse the directorytext file at each request. To decode the obscure id's into real paths,I would thus like to be able to use something like a simple hashtable inmemory.But any "instance" of the module should still be able to notice when thebasic authoritative directory file has changed, and reload it whenneeded before serving a new object. Of course this should not happenmore often than necessary.

My basic idea would be to create a perl package encapsulating thedecoding of the obscure paths and the updates to the memory hashtable,and use this package in the /getdoc/ handler.

A PerlChildInitHandler would initially read-in the text file and buildthe internal table, prior to any read handler access. (Alternatively,this could be done the first time a response handler needs to access thestructure).The module would contain a method "decode_path(obscure_id)", madeavailable to the response handler, which would take care of checkingthat the table is still up-to-date, and if not, re-read and re-parse itinto the internal hastable.I imagine that each child process could (and probably would) have itsown copy of the table, but that each request handler, while processingone request, could have access to that same child-level hashtable.


My doubts focus (mainly) on the following issues

- wether or not I *can* declare and initialise some object e.g. in thePerlChildInitHandler, and later access that same object in the requesthandlers.- also, if later from the request handler, I would call a method of thisobject that updates the object content, wether this updated object wouldstill be "shared" by all subsequent instances of request handlers.- supposing that this architecture is running within a threadedenvironment, are there special guidelines to follow regarding thepossibility that 2 threads in the same child would access the object atthe same time and try to update the internal table ?- and if I follow such guidelines, does the same code also work if ithappens to run in a non-threaded environment ?- if there is a mandatory difference between threaded/non-threaded mp2perl code, can I check at run-time under which environment I'm running,and condition which code is executed accordingly ?

Thanks for your patience reading this, and thanks in advance for anycomments, answers or suggestions.

Sharing data between many requests

Reply via email to