Re: Spamassassin Learn

Matt Kettler Tue, 07 Feb 2006 17:01:02 -0800

Jim C. Nasby wrote:
> On Tue, Feb 07, 2006 at 05:36:56PM -0600, Jim C. Nasby wrote:
>   
>> On Tue, Feb 07, 2006 at 06:17:20PM -0500, Matt Kettler wrote:
>>     
>>> Jim C. Nasby wrote:
>>>       
>>>>> Are there any autolearn strings? Are they all "autolearn=no"? are there 
>>>>> any
>>>>> decent number that are autolearn=failed or autolearn=disabled?
>>>>>
>>>>>           
>>>> grep -r autolearn caughtspam/ | grep -v 'Binary file' | sed -e
>>>> 's/.*autolearn=\([^ ]*\).*/\1/'|sort|uniq -c
>>>> 1545 no
>>>>  140 spam
>>>>    4 unavailable
>>>>         
>>> Fair enough, that at least suggests that the autolearner is working. 
>>> However,
>>> that learning ratio is pretty low.
>>>
>>> Are you using network tests? Without DNSBLs it's often hard to get enough 
>>> header
>>> points to cause spam learning..
>>>       
>> I believe so...
>>
>> grep loadplugin /usr/local/etc/mail/spamassassin/init.pre
>> # loadplugin Mail::SpamAssassin::Plugin::RelayCountry
>> loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
>> loadplugin Mail::SpamAssassin::Plugin::Hashcash
>> loadplugin Mail::SpamAssassin::Plugin::SPF
>>
>> grep -v # ~/.spamassassin/user_prefs | grep -v whitelist
>> bayes_auto_learn 1
>> bayes_auto_learn_threshold_spam 5.0
>>     
>
> Hmm... here's something interesting...
>
> grep -r autolearn pgsql/ | grep -v 'Binary file' | sed -e
> 's/.*autolearn=\([^ ]*\).*/\1/' | sort | uniq -c
> 2010 ham
>  198 no
>   17 unavailable
>
> So a big chunk of [EMAIL PROTECTED] email is being learned as ham.
> Looking further, I see...
>
> X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham
>         version=3.1.0
>
> ISTM that having the thresholds setup so that BAYES_00 scores low enough
> to autolearn is a BadThing, as it creates a positive feedback loop. :)
> I've added bayes_auto_learn_threshold_nonspam -2.6 to my personal
> config; we'll see if that helps.
>


Jim,

Bayes is NOT used when calculating autolearning score, that would
promote self feedbac. As I said before, the autolearner's concept of
score is VERY different from the final message score. Score
contributions from bayes, white/blacklists, and the AWL are all ignored
by the autolearner. It also looks up the individual rule scores from set
0 or 1 instead of 2 or 3. This is a MASSIVE difference.


However, the default autolearn threshold is 0.1. That's a POSITIVE
threshold. To the autolearner that message scored 0 points. 0 is less
than 0.1, so it learned as HAM.

I'd suggest re-adjusting your threshold, as a default spamassasin config
will only VERY rarely generate a negative score to the autolearner. The
only rules that can do it are bondedsender, habeas COI/SOI and hashcash.
Hashcash is so rare it may as well not exist at present. BondedSender
and Habeas are only use by large legitamate mailers, so none of your
person-to-person mail will ever get autolearned in your current setup
unless you know someone who uses hashcash.

Re: Spamassassin Learn

Reply via email to