Looking in the wiki I found this very interesting page:

http://wiki.apache.org/spamassassin/BayesBenchmarkResults


Now, obviously the SQL implementations are the implementation of choice if you
want to share a bayes DB across multiple servers. No question about it, that's
what SQL servers are for.

However, I don't share a database across a network. It's just a local DB,
currently using the default DB_File. Since upgrading from 2.64 to 3.1.0 my bayes
performance fell through the floor. So I've been looking at switching by
BayesStore to improve performance.

In Justin's 3.1.0 release announcement he declared SQL as the preferred bayes
storage method. However, looking at the results in the above table SDBM is
clearly the fastest at all things except phase 3. Being that phase3 is the
execution time for a --force-expire, I can't see how that matters very much to
me, clearly scanning and learning speed are more important.

Given that I'm looking for maximum message scanning speed and lowest RAM
overhead on a single server, is there any reason for me to prefer using SQL over
SDBM?

Clearly from the test results DB_File is the slowest at message scanning.
Although it's marginally faster at learning and forgetting than the SQL options,
the scanning speed is painfully slow. DB_File is also slower than SDBM for all
test cases.

Reply via email to