Looking in the wiki I found this very interesting page: http://wiki.apache.org/spamassassin/BayesBenchmarkResults
Now, obviously the SQL implementations are the implementation of choice if you want to share a bayes DB across multiple servers. No question about it, that's what SQL servers are for. However, I don't share a database across a network. It's just a local DB, currently using the default DB_File. Since upgrading from 2.64 to 3.1.0 my bayes performance fell through the floor. So I've been looking at switching by BayesStore to improve performance. In Justin's 3.1.0 release announcement he declared SQL as the preferred bayes storage method. However, looking at the results in the above table SDBM is clearly the fastest at all things except phase 3. Being that phase3 is the execution time for a --force-expire, I can't see how that matters very much to me, clearly scanning and learning speed are more important. Given that I'm looking for maximum message scanning speed and lowest RAM overhead on a single server, is there any reason for me to prefer using SQL over SDBM? Clearly from the test results DB_File is the slowest at message scanning. Although it's marginally faster at learning and forgetting than the SQL options, the scanning speed is painfully slow. DB_File is also slower than SDBM for all test cases.