Re: [SAtalk] how to change the bayes auto_learn threshold to zero

Martin Radford Sat, 31 Jan 2004 09:55:39 -0800

At Wed Jan 28 23:01:48 2004, Brett Dikeman wrote:
> 
> Martin Radford wrote:
> 
> > It might be because you get the occasional false positive that you
> > want to avoid (but all the rest come under your threshold).  You
> > probably would want these autolearned as ham.
> 
> Actually, at the moment the bayes engine thinks 99% of the messages 
> going through it are spam, simply because it's auto-learning spam 
> messages but never auto-learning ham...because messages never get 
> negative scores.


That's pretty much "by design" in SA.  Experience has shown that
spammers pick up on negative scoring rules and abuse them. 

> > Or it might be because the messages are from a mailing list like this
> > one, where the messages may well contain extracts from spam.  In this
> > case you positively *don't* want to autolearn them as ham, because
> > it'll adversely affect the Bayes database's training.

> So yes- I think your argument is rather obscure and moot for 99% of your 
> users.

But it's not moot for all users.

> Did you consider that the occasional spam auto-learned as ham really 
> isn't that bad, if you're auto-learning many more legitimate messages? 
> SA tends to grossly tip the scales towards auto-learning spam versus 
> ham, all for the sake of not accidentally learning a rather 
> theoretical(for most users) case.  Left to its own devices, the bayes 
> engine will eventually mark more and more messages as spam, and the 
> engine becomes completely useless- which is much worse than a slight 
> inaccuracy from the occasional spam that gets auto-learned as ham.

It gets out of balance anyway with the default setup, because it will
auto-learn messages scoring 0 as ham.  I regularly see messages packed
full of Bayes poison that score 0 and hence get autolearned in a
default install of 2.6x.  There's not much you can do about this
without negative rules, and you can't have them.

> Developers are always well-meaning when they institute rules(that cannot 
> be overridden) to address specific circumstances.  However, these little 

I certainly don't claim that the possibilities I outlined above are
good reasons, just that they are possible reasons for the existing
setup.

If you feel strongly that whitelisted messages should be autolearned
as ham, you should enter this wish in the Bugzilla
(http://bugzilla.spamassassin.org/) so the developers have a record of
it and can consider it for future versions.

> In several cases, spamassassin assumes it knows better than I do, and 
> overrides my config directives(and further, doesn't warn me it's doing 
> so).  If you want to warn me in the install/config/whatever docs that 
> "turning on auto-learning of messages above X score" or "turning on 
> auto-learning of whitelisted messages is dangerous", fine.  So be it. 
> Some people might not instantly realize the implication.  But give us 
> the OPTION of doing it.
> 
> So here's my suggestion, and it's two-part:
> 
> a)strip the min+max limit controls from the two auto-learn params.  If I 
> want to be a moron and set my auto-learn-spam to 2(ie, below the magic 
> number "6"), that's my bloody business, not yours ;-)
> 
> b)add a auto_learn_whitelist, and have a couple of options. 
> Off(nothin'), auto(ie let bayes auto-learn messages that were 
> auto-whitelisted), manual(ie config-file whitelist rules) and all(both 
> auto and manual, mwuaha).  Ok, so they're not intelligently named, but 
> that combo will make just about anybody happy.
> 
> Make the default 'off' if you REALLY, really think the whole 
> subversive-spam thing is a problem for the MAJORITY OF YOUR USERS. 
> Chances are "manual" is the next-safest option, since generally users 
> have to be smarter than the average bear to set up their own rules(or 
> their admins had good reasons for adding global rules- as I did on our 
> system, whitelisting our biggest customers).  Auto and Both would be the 
> least safest.

I trust you'll submit these to the bugzilla.  I don't personally see
problems with these, but then again I'm only a user of SA.

Martin
-- 
Martin Radford              |   "Only wimps use tape backup: _real_ 
[EMAIL PROTECTED] | men just upload their important stuff  -o)
Registered Linux user #9257 |  on ftp and let the rest of the world  /\\
- see http://counter.li.org |       mirror it ;)"  - Linus Torvalds _\_V


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] how to change the bayes auto_learn threshold to zero

Reply via email to