Jim C. Nasby wrote: > On Tue, Feb 07, 2006 at 05:36:56PM -0600, Jim C. Nasby wrote: > >> On Tue, Feb 07, 2006 at 06:17:20PM -0500, Matt Kettler wrote: >> >>> Jim C. Nasby wrote: >>> >>>>> Are there any autolearn strings? Are they all "autolearn=no"? are there >>>>> any >>>>> decent number that are autolearn=failed or autolearn=disabled? >>>>> >>>>> >>>> grep -r autolearn caughtspam/ | grep -v 'Binary file' | sed -e >>>> 's/.*autolearn=\([^ ]*\).*/\1/'|sort|uniq -c >>>> 1545 no >>>> 140 spam >>>> 4 unavailable >>>> >>> Fair enough, that at least suggests that the autolearner is working. >>> However, >>> that learning ratio is pretty low. >>> >>> Are you using network tests? Without DNSBLs it's often hard to get enough >>> header >>> points to cause spam learning.. >>> >> I believe so... >> >> grep loadplugin /usr/local/etc/mail/spamassassin/init.pre >> # loadplugin Mail::SpamAssassin::Plugin::RelayCountry >> loadplugin Mail::SpamAssassin::Plugin::URIDNSBL >> loadplugin Mail::SpamAssassin::Plugin::Hashcash >> loadplugin Mail::SpamAssassin::Plugin::SPF >> >> grep -v # ~/.spamassassin/user_prefs | grep -v whitelist >> bayes_auto_learn 1 >> bayes_auto_learn_threshold_spam 5.0 >> > > Hmm... here's something interesting... > > grep -r autolearn pgsql/ | grep -v 'Binary file' | sed -e > 's/.*autolearn=\([^ ]*\).*/\1/' | sort | uniq -c > 2010 ham > 198 no > 17 unavailable > > So a big chunk of [EMAIL PROTECTED] email is being learned as ham. > Looking further, I see... > > X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham > version=3.1.0 > > ISTM that having the thresholds setup so that BAYES_00 scores low enough > to autolearn is a BadThing, as it creates a positive feedback loop. :) > I've added bayes_auto_learn_threshold_nonspam -2.6 to my personal > config; we'll see if that helps. >
Jim, Bayes is NOT used when calculating autolearning score, that would promote self feedbac. As I said before, the autolearner's concept of score is VERY different from the final message score. Score contributions from bayes, white/blacklists, and the AWL are all ignored by the autolearner. It also looks up the individual rule scores from set 0 or 1 instead of 2 or 3. This is a MASSIVE difference. However, the default autolearn threshold is 0.1. That's a POSITIVE threshold. To the autolearner that message scored 0 points. 0 is less than 0.1, so it learned as HAM. I'd suggest re-adjusting your threshold, as a default spamassasin config will only VERY rarely generate a negative score to the autolearner. The only rules that can do it are bondedsender, habeas COI/SOI and hashcash. Hashcash is so rare it may as well not exist at present. BondedSender and Habeas are only use by large legitamate mailers, so none of your person-to-person mail will ever get autolearned in your current setup unless you know someone who uses hashcash.