On Tue, Sep 02, 2003 at 01:02:47PM +1200, Simon Byrnand wrote:
> At 02:24 2/09/2003 +0200, Carlo Wood wrote:
> >On Tue, Sep 02, 2003 at 09:56:54AM +1200, Simon Byrnand wrote:
> I don't whitelist this mailing list and I know at least one of the 
> developers (Justin) doesn't either and I don't have problems. I very rarely 
> see a message from this list with a sample spam in it which scores above my 
> threshold of 5, and I've *never* seen one high enough to be autolearnt. (> 
> 15 not counting bayes)
> 
> >Learning mails that *discuss* spam as being spam will make the Bayesian
> >classifier less accurate.
> 
> Are you quite sure about that ? Lets assume for a moment that a spam 
> reposted in the list in the body of a message managed to score over 15, 
> (which I've never seen happen in my time on the list) I think you'll find 
> bayesian classifiers don't work the way you probably think they do. All 
> they do is tokenize a message based on word boundries, and build word 
> statistics based on that. They do *not* learn specific messages or phrases 
> as spam or ham, they simply count the statistical prevalence of words.

Also -- autolearning as spam requires a minimum of 3 points from both header and
body hits.  (I think this is in the doco too.)

It would be an extremely rare occurrence that a post to SA-talk can accumulate
3 points in hdr hits.

I agree with Simon -- IMO it's not a problem.   Autolearning has a number of
safeguards to avoid autolearning from the wrong mails.

--j.


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to