I coded a caching system using BerkeleyDB::Hash as the backend. It was
working fine until the database file became fairly large (850M).
At some point the performance degraded and the web server process
accessing the database started hanging. Someone suggested locking issues
being the cause for the hangups, but trying to access the db from a single
script even when there were no other processes accessing it still hung.
I am sure someone has done a similar thing before and would be very
interested to hear any success/failure stories. I starting to wonder
whether I would be better off just using an RDBMS table (2 columns -
key,value) as the cache backend to avoid these types of issues.
There's quite a few options for caching in mod_perl. I describe a couple at
the start of my caching module...
---
http://search.cpan.org/~robm/Cache-FastMmap-1.09/FastMmap.pm
DESCRIPTION
In multi-process environments (eg mod_perl, forking daemons, etc), it's
common to want to cache information, but have that cache shared between
processes. Many solutions already exist, and may suit your situation better:
MLDBM::Sync - acts as a database, data is not automatically expired, slow
IPC::MM - hash implementation is broken, data is not automatically expired,
slow
Cache::FileCache - lots of features, slow
Cache::SharedMemoryCache - lots of features, VERY slow. Uses IPC::ShareLite
which freeze/thaws ALL data at each read/write
DBI - use your favourite RDBMS. can perform well, need a DB server running.
very global. socket connection latency
Cache::Mmap - similar to this module, in pure perl. slows down with larger
pages
BerkeleyDB - very fast (data ends up mostly in shared memory cache) but acts
as a database overall, so data is not automatically expired
---
The main things I'd say are:
1. What version of bdb are you using. I've found 4.0-4.2 to be fairly
unstable. The late 3's (eg >=3.3) and more recent 4's (eg >=4.3) seem better
2. Try running db_verify on your database to see if it picks up any
problems/corruption
3. Consider switching to something else, even if it only supports a smaller
size. Do you really need 850M of cached data?
4. Also not mentioned above, but look at memcached. It seems to be well
designed for LARGE, global caches
Rob