24.9.2009 21:55, Kris Deugau wrote:
Steven W. Orr wrote:
The questions is this: I thought that InnoDB was going
to consume *more* resources because the purpose of it was to support
transactions. Am I wrong? If I convert to a higher rev of MySQL and
get InnoDB
will I get *better* performance?
Likely not; MySQL gets a great deal of its speed from... not doing
transactions.
Not having done any detailed performance tuning or testing, I can't
say for sure - but from what I recall from my reading on this subject
that's one of the points formally documented by the MySQL devs
themselves.
MyISAM locks whole table when it needs a lock, InnoDB has row locking,
thus operations on tables like awl and bayes_token will be much better
if there are multiple spamd connected to the database.
Cron'ed Bayes-feeding tasks that are known to read the same set of
messages for some amount of time are one reason to not simply "delete
from bayes_seen;". (Unless you really *want* to relearn the same
messages over and over again - FWIW, I *have* found that cases of
"this spam was sent to half my userbase" make that something you might
want to do deliberately now and then.)
We keep two weeks' worth for both bayes_seen and AWL; the on-disk
files for MySQL run about 300-350M for AWL and ~150-200M for
bayes_seen. This is a modest cluster that idles along filtering ~400K
of ~3M messages daily. Autolearn is enabled - without it, bayes_seen
would likely not grow quite so fast.
One can find only new files for learning. This my cron job for learning
new ham from my personal Maildir:
----------------------------------------------------------------------------------------------------------------------------------------------
#/bin/bash
if [ -f ~/.learnham.running ]; then exit; fi
touch ~/.learnham.running
trap "rm -f ~/.learnham.running ~/.sa-learn.ham.tmp" EXIT
cd ~
find Maildir/ -newer .sa-learn.ham -name \*`hostname`\* | grep -v -i
spam | grep -v Trash>.sa-learn.ham.tmp
if test -s .sa-learn.ham.tmp; then
mv .sa-learn.ham.tmp .sa-learn.ham
/usr/bin/sa-learn -u spam --showdots --ham --folders=.sa-learn.ham
fi
----------------------------------------------------------------------------------------------------------------------------------------------
It uses 'find' to gets a list of new files (newer that the last run).