Hi All,
I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes,
and all three of them are sharing the same bayes DB using a MySQL
cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes
with a quadcore and 4 GB of memory. Everything is working fine, even the
AWL in SQL, except for Bayes. The bayes database currently houses a bit
less than 500k tokens and the database size is not very big either, as
the datanodes have less than 1 GB of storage in use. I've followed the
instructions from the Spamassassin wiki, and I also used the supplied
bayes_mysql.sql file to create my tables. In case anyone is interested,
you can find the cluster.ini and the my.cnf used on the SQL nodes here:
http://www.wcborstel.com/web/mysql/my.cnf
http://www.wcborstel.com/web/mysql/cluster.ini
I've been doing quite a bit of research and so on. First I thought it
were the settings of my cluster, as I knew there was a lot to be tuned.
Things like query cache sizes, thread cache, table cache, specific NDB
settings et cetera. Unfortunately that didn't have seemed to help. I
came to the conclusion that the bayes table was simply too heavily used.
I have scantimes of 30-200+ seconds with bayes enabled, while I have
scantimes under 8 seconds when disabling bayes.
Now the problem at the first glance seems to be, from my perspective
(please correct me if I'm wrong), the actual queries being done. For
every mail being scanned by spamassassin, it seems to be doing the
"SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM
bayes_token" query every time. This effectively requesting the entire
bayes_token table, which can take up to 10-20 seconds. Now one would
think that this is a nice canidate to cache. I would agree,
unfortunately the MySQL query cache is not very efficient here, seeing
as the atime of a token is being updated continuously. In other words,
the cache is pretty much invalid most of the time. My Qcache hits is
also very low (I noticed 8k inserts with about 250 cache hits). It seems
that the query cache is either not suitable for this or I am doing
something majorly wrong :)
Here is how I came to my findings. Note I removed some SELECT RPAD rows
to avoid spammyness (they show essentially the same as the other rows
anyway):
mysql> show processlist\G
*************************** 1. row ***************************
Id: 1
User: system user
Host:
db:
Command: Daemon
Time: 0
State: Waiting for event from ndbcluster
Info: NULL
FROM bayes_token
*************************** 3. row ***************************
Id: 1464
User: bayes
Host: ::ffff:1.2.3.4:57082
db: spamd
Command: Query
Time: 13
State: Sending data
Info: SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
FROM bayes_token
*************************** 5. row ***************************
Id: 1479
User: bayes
Host: ::ffff:1.2.3.4:57133
db: spamd
Command: Query
Time: 24
State: Searching rows for update
Info: UPDATE bayes_token SET atime = '1250259027' WHERE id = '3' AND
token IN ('e?5?U','?;?6','?e?F?','?
*************************** 8. row ***************************
Id: 1485
User: bayes
Host: ::ffff:1.2.3.4:57148
db: spamd
Command: Query
Time: 18
State: Sending data
Info: SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
FROM bayes_token
*************************** 9. row ***************************
Id: 1487
User: bayes
Host: ::ffff:1.2.3.4:57155
db: spamd
Command: Query
Time: 18
State: Sending data
Info: SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
FROM bayes_token
12 rows in set (0.00 sec)
As you can see, row #9 has been executing for 18 seconds already. I was
first playing around with trying to create some additional indexes, but
I've seen a couple of SELECT queries where the indexes where actually
used and that was pretty quick. Now I am by far not a MySQL guru, so
again, if anyone has any info in regards to creating additional indexes
I would love to hear them. Currently I don't have any indexes other than
those provided by the bayes_mysql.sql file.
Currently I'm running my mail servers without bayes where they are
performing fine. Does anyone have any recommendations or experiences
with this? Or perhaps is there more information needed? Also will adding
more memory to my datanodes solve anything?
Thanks a lot for any feedback.
Best regards,
Jorn Argelo
__________ Information from ESET NOD32 Antivirus, version of virus signature
database 4336 (20090814) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com