Re: HUGE bayes DB (non-sitewide) advice?

Dhawal Doshy Tue, 08 Nov 2005 23:09:21 -0800

email builder wrote:

In-memory storage:
All data stored in each data node is kept in memory on the node's
host computer. For each data node in the cluster, you must have
available an amount of RAM equal to the size of the database times
the number of replicas,
This refers to the first line: "In-memory storage". Of course you can'tdo that with 160GB DBs. You can still cluster - look at DRBDhttp://www.drbd.org/
I guess the relevant point for this thread is that I don't necessarily think
that this is the silver bullet as implied.  Even if you use a
high-availability clustering technology that can mirror writes and reads, you
are STILL dealing with the possibility of a database that is just massive.Processing this size of database will still be disk-bound unless you have an
unheard-of amount of memory; I don't think there's any reason to think that
clustering the problem will make it go away.

So I still wonder if anyone has any musings on my earlier questions?


A few spamassassin hacks could help.

1. Have multiple mysql servers, split your users into A-J, K-S, T-Z ORsmaller units and distribute them over different servers, with some HA /failover mechanism (possibly drbd).2. Have 2 level of bayes, one large global and the other smaller peruser if thats possible. Of course SA will need to be changed to use boththe bayes'. This way you could have 2 large servers for the global bayesdb and 2 for the per user bayes dbs.


Also see if this SQL failover patch can help you in any way.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2197

Finally to speed up the database have a look at this, the people atwikimedia / livejournal seem to be happy using it.

http://www.danga.com/memcached/

Hope that helps,
- dhawal

Re: HUGE bayes DB (non-sitewide) advice?

Reply via email to