Thanks a lot Perrin - I really like the current method (if it were to stay on 1 machine and not grow). Caching per child has not really been a problem once I got beyond the emotional hangup of what seemed to be duplicative, waste of memory. I am totally amazed how fast and efficient using modperl in this way has been. The hash building queries issued by the children are very simple selects but the data provided by (and cached within) them is used in many ways throughout the session such that not having them would require extra joins in multiple places and queries in other places that are currently not needed at all. -- (i.e. collaborative environment ACL's etc.). To be clear, the hashes are not only for quick de-normalizing, but they serve a vital caching function.
The problem is that I am now moving the database off localhost and configuring a second web node now.
what it is that you don't like about your current method.
I'm afraid that: 1. hashes get really big (greater than a few MB's each) 2. re-caching entire hash just b/c 1 key updated (waste). 3. latency for pulling cache data from remote DB. 4. doing this for all children. For now, what seems like the 'holy-grail' (*) is to cache last_modified for each type, (available to the cluster, say through memcached), in a way that indicates only which parts of the cache (which keys of each hash) the children need to update/delete such that a child rarely, if ever, will only need to query for just those keys and directly modify their own hashes accordingly to keep current. (*) I'm not too clear about this, but it seems like the real 'holy-grail' would be to do this within apache in a scoreboard like way. -w On 5/19/07, Perrin Harkins <[EMAIL PROTECTED]> wrote:
On 5/19/07, Will Fould <[EMAIL PROTECTED]> wrote: > Here's the situation: We have a fully normalized relational database > (mysql) now being accessed by a web application and to save a lot of complex > joins each time we grab rows from the database, I currently load and cache a > few simple hashes (1-10MB) in each apache processes with the corresponding > lookup data Are you certain this is saving you all that much, compared to just doing the joins? With proper indexes, joins are fast. It could be a win to do them yourself, but it depends greatly on how much of the data you end up displaying before the lookup tables change and have to be re-fetched. > Is anyone doing something similar? I'm wondering if implementing a BerkleyDB > or another slave store on each web node with a tied hash (or something > similar) is feasible and if not, what a better solution might be. Well, first of all, I wouldn't feed a tied hash to my neighbor's dog. It's slower than method calls, and more confusing. There are lots of things you could do here, but it's not clear to me what it is that you don't like about your current method. Is it that when the database changes you have to do heavy queries from every child process? That also kills any sharing of the data. Do you have more than one server, or expect to soon? - Perrin