Thanks a lot Perrin -

I really like the current method (if it were to stay on 1 machine and not
grow). Caching per child has not really been a problem once I got beyond the
emotional hangup of what seemed to be duplicative, waste of memory.  I am
totally amazed how fast and efficient using modperl in this way has been.
The hash building queries issued by the children are very simple selects but
the data provided by (and cached within) them is used in many ways
throughout the session such that not having them would require extra joins
in multiple places and queries in other places that are currently not needed
at all. -- (i.e. collaborative environment ACL's etc.).  To be clear, the
hashes are not only for quick de-normalizing, but they serve a vital caching
function.

The problem is that I am now moving the database off localhost and
configuring a second web node now.

what it is that you don't like about your current method.

I'm afraid that:
  1. hashes get really big (greater than a few MB's each)
  2. re-caching entire hash just b/c 1 key updated (waste).
  3. latency for pulling cache data from remote DB.
  4. doing this for all children.

For now, what seems like the 'holy-grail' (*) is to cache last_modified for
each type, (available to the cluster, say through memcached), in a way that
indicates only which parts of the cache (which keys of each hash) the
children need to update/delete such that a child rarely, if ever, will only
need to query for just those keys and directly modify their own hashes
accordingly to keep current.

(*) I'm not too clear about this, but it seems like the real 'holy-grail'
would be to do this within apache in a scoreboard like way.

-w


On 5/19/07, Perrin Harkins <[EMAIL PROTECTED]> wrote:

On 5/19/07, Will Fould <[EMAIL PROTECTED]> wrote:
> Here's the situation:  We have a fully normalized relational database
> (mysql) now being accessed by a web application and to save a lot of
complex
> joins each time we grab rows from the database, I currently load and
cache a
> few simple hashes (1-10MB) in each apache processes with the
corresponding
> lookup data

Are you certain this is saving you all that much, compared to just
doing the joins?  With proper indexes, joins are fast.  It could be a
win to do them yourself, but it depends greatly on how much of the
data you end up displaying before the lookup tables change and have to
be re-fetched.

> Is anyone doing something similar? I'm wondering if implementing a
BerkleyDB
> or another slave store on each web node with a tied hash (or something
> similar) is feasible and if not, what a better solution might be.

Well, first of all, I wouldn't feed a tied hash to my neighbor's dog.
It's slower than method calls, and more confusing.

There are lots of things you could do here, but it's not clear to me
what it is that you don't like about your current method.  Is it that
when the database changes you have to do heavy queries from every
child process?  That also kills any sharing of the data.  Do you have
more than one server, or expect to soon?

- Perrin

Reply via email to