Here's the issue: System: Running SA 2.54, FreeBSD Unix, Berkeley DB 1.85 (Hash, version 2):
Problem: When bayes_toks grows to more than 5K, it becomes corrupted during sa-learn and ultimately trashed or lost. My solution: Set bayes_expiry_max_db_size to lower level to force expiry, so that bayes_toks doesn't grow too large. I did not make changes to configuration to bayes_expiry_min_db_size or to bayes_expiry_scan_count Questions/Problems: 1. Why did bayes_toks grow to more than 5k in the first place? Documentation for SA 2.5x (sa-learn.html) says: > Once it hits 5000 bytes, the bayes_toks database is > locked, and the message counter entry in that database is > increased accordingly. 2. What is default configuration for bayes_expiry_max_db_size for SA 2.5x and how large should the resulting file be? Through experimentation, I have ended up with a setting: bayes_expiry_max_db_size 150000 With this setting, bayes_toks never gets any larger than 2,556 kb. According to documentation for SA 2.6 (sa-learn.txt) > "bayes_expiry_max_db_size" specifies both the auto-expire token count > point, as well as the resulting number of tokens after expiry as > described above. The default value is 150,000, which is roughly > equivalent to a 6Mb database file if you're using DB_File. Note that my setting is the SAME as the default for 2.6 - but rather than a 6Mb db file, I end up with a 2.5 Mb file, with a Bayes corpus of ~2800 or less. Documentation for SA 2.5 (sa-learn.html) says: > bayes_expiry_min_db_size is part of the SpamAssassin > configuration. The default value is 100000, which is > roughly equivalent to a 5Mb database file if you're using > DB_File. So here is where I am totally confused: from what I can tell, my setting of bayes_expiry_max_db_size=150000 should either have no effect whatsoever, or it should leave me with a bayes_toks file that will grow to 5K - and I end up with a file half that size. NOTE: Bayes works fine for me this way, but my guess is that with the small corpus and short expiry cycle I may see erratic performance over time. Because my system seems to get much more spam than ham, I end up with a 9:2 ratio of spam to ham as autolearn continues to feed incoming email to the database. But my main concern right now is trying to figure out why my experience doesn't match what the documentation says I should expect. -Abigail ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk