At Wed Jan 28 23:01:48 2004, Brett Dikeman wrote: > > Martin Radford wrote: > > > It might be because you get the occasional false positive that you > > want to avoid (but all the rest come under your threshold). You > > probably would want these autolearned as ham. > > Actually, at the moment the bayes engine thinks 99% of the messages > going through it are spam, simply because it's auto-learning spam > messages but never auto-learning ham...because messages never get > negative scores.
That's pretty much "by design" in SA. Experience has shown that spammers pick up on negative scoring rules and abuse them. > > Or it might be because the messages are from a mailing list like this > > one, where the messages may well contain extracts from spam. In this > > case you positively *don't* want to autolearn them as ham, because > > it'll adversely affect the Bayes database's training. > So yes- I think your argument is rather obscure and moot for 99% of your > users. But it's not moot for all users. > Did you consider that the occasional spam auto-learned as ham really > isn't that bad, if you're auto-learning many more legitimate messages? > SA tends to grossly tip the scales towards auto-learning spam versus > ham, all for the sake of not accidentally learning a rather > theoretical(for most users) case. Left to its own devices, the bayes > engine will eventually mark more and more messages as spam, and the > engine becomes completely useless- which is much worse than a slight > inaccuracy from the occasional spam that gets auto-learned as ham. It gets out of balance anyway with the default setup, because it will auto-learn messages scoring 0 as ham. I regularly see messages packed full of Bayes poison that score 0 and hence get autolearned in a default install of 2.6x. There's not much you can do about this without negative rules, and you can't have them. > Developers are always well-meaning when they institute rules(that cannot > be overridden) to address specific circumstances. However, these little I certainly don't claim that the possibilities I outlined above are good reasons, just that they are possible reasons for the existing setup. If you feel strongly that whitelisted messages should be autolearned as ham, you should enter this wish in the Bugzilla (http://bugzilla.spamassassin.org/) so the developers have a record of it and can consider it for future versions. > In several cases, spamassassin assumes it knows better than I do, and > overrides my config directives(and further, doesn't warn me it's doing > so). If you want to warn me in the install/config/whatever docs that > "turning on auto-learning of messages above X score" or "turning on > auto-learning of whitelisted messages is dangerous", fine. So be it. > Some people might not instantly realize the implication. But give us > the OPTION of doing it. > > So here's my suggestion, and it's two-part: > > a)strip the min+max limit controls from the two auto-learn params. If I > want to be a moron and set my auto-learn-spam to 2(ie, below the magic > number "6"), that's my bloody business, not yours ;-) > > b)add a auto_learn_whitelist, and have a couple of options. > Off(nothin'), auto(ie let bayes auto-learn messages that were > auto-whitelisted), manual(ie config-file whitelist rules) and all(both > auto and manual, mwuaha). Ok, so they're not intelligently named, but > that combo will make just about anybody happy. > > Make the default 'off' if you REALLY, really think the whole > subversive-spam thing is a problem for the MAJORITY OF YOUR USERS. > Chances are "manual" is the next-safest option, since generally users > have to be smarter than the average bear to set up their own rules(or > their admins had good reasons for adding global rules- as I did on our > system, whitelisting our biggest customers). Auto and Both would be the > least safest. I trust you'll submit these to the bugzilla. I don't personally see problems with these, but then again I'm only a user of SA. Martin -- Martin Radford | "Only wimps use tape backup: _real_ [EMAIL PROTECTED] | men just upload their important stuff -o) Registered Linux user #9257 | on ftp and let the rest of the world /\\ - see http://counter.li.org | mirror it ;)" - Linus Torvalds _\_V ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk