email builder wrote:
In-memory storage:
All data stored in each data node is kept in memory on the node's
host computer. For each data node in the cluster, you must have
available an amount of RAM equal to the size of the database times
the number of replicas,
This refers to the first line: "In-memory storage". Of course you can't
do that with 160GB DBs. You can still cluster - look at DRBD
http://www.drbd.org/
I guess the relevant point for this thread is that I don't necessarily think
that this is the silver bullet as implied. Even if you use a
high-availability clustering technology that can mirror writes and reads, you
are STILL dealing with the possibility of a database that is just massive.
Processing this size of database will still be disk-bound unless you have an
unheard-of amount of memory; I don't think there's any reason to think that
clustering the problem will make it go away.
So I still wonder if anyone has any musings on my earlier questions?
A few spamassassin hacks could help.
1. Have multiple mysql servers, split your users into A-J, K-S, T-Z OR
smaller units and distribute them over different servers, with some HA /
failover mechanism (possibly drbd).
2. Have 2 level of bayes, one large global and the other smaller per
user if thats possible. Of course SA will need to be changed to use both
the bayes'. This way you could have 2 large servers for the global bayes
db and 2 for the per user bayes dbs.
Also see if this SQL failover patch can help you in any way.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2197
Finally to speed up the database have a look at this, the people at
wikimedia / livejournal seem to be happy using it.
http://www.danga.com/memcached/
Hope that helps,
- dhawal