Wes wrote: > I've searched and searched the archives, but no answers.. Sorry for the > lengthy email, but... > > > Spam Assassin 3.2.3-1 > Smf-spamd 1.3.1 with spamd > Dual quad-core Xeon 5355 (Woodcrest) systems with 8GB memory. > > Configuration: > > bayes_auto_learn 1 > bayes_expiry_max_db_size 150000 > lock_method flock > rules compiled with sa-compile > Auto-whitelist module is loaded > Number of spamd children: 5 > > <snip> > As an immediate solution, I modified > > /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/Conf.pm > > And set bayes_expiry_period to 21600 (6 hours) and run an expire every 3 > hours (why isn't this a configuration file parameter??) > > Even if it was a setting, it's now irrelevant. The fact that you're running a force-expire every 3 hours makes the bayes_expiry_period moot. (unless you set it to less than 3 hours).
Read below to understand what this variable actually does. It does not dictate token lifespan. > > On to the questions... > > 1. Setting the expiry period down that low doesn't see to be an optimal > thing to do from an effectiveness standpoint. Comments on this? Am I > missing something? Due to the type of user base, all-manual learning isn't > likely to work well. Is auto-learning just a waste of resources in this > case? > > 2. If I set up manual learning where false positives and false negatives can > be manually sent in by users and added to the site-wide configuration, won't > they also be subject to the (short) expiration period, or is manual learning > kept permanently? > Manual learning is handled no differently than auto-learning. However, it's important to note that the bayes_expiry_period does not dictate token life. It dictates how often expiry check will run automatically. Basically, SA looks at the database, finds out when the last expire ran, and if more than bayes_expiry_period has elapsed, it kicks off an auto-expire. Since you're manually expiring every 3 hours, your modified bayes_expiry_period never comes into effect. When expiry (either due to the bayes_expiry_period or a manual force-expire) runs, it checks if the database has more than "bayes_expiry_max_db_size" tokens in it, SA will attempt to reduce the database to 75% of bayes_expiry_max_db_size, keeping the most recently used tokens. In your case, you have a high learning volume, so this means that every 3 hours (due to your manual sa-learn --force-expire), your database is going to be reduced to the 100,000 most-recently used tokens. If you want to increase token lifespan, you'd increase bayes_expiry_max_db_size so that more tokens are kept at expire time.