On Tue, Sep 02, 2003 at 01:02:47PM +1200, Simon Byrnand wrote: > At 02:24 2/09/2003 +0200, Carlo Wood wrote: > >On Tue, Sep 02, 2003 at 09:56:54AM +1200, Simon Byrnand wrote: > I don't whitelist this mailing list and I know at least one of the > developers (Justin) doesn't either and I don't have problems. I very rarely > see a message from this list with a sample spam in it which scores above my > threshold of 5, and I've *never* seen one high enough to be autolearnt. (> > 15 not counting bayes) > > >Learning mails that *discuss* spam as being spam will make the Bayesian > >classifier less accurate. > > Are you quite sure about that ? Lets assume for a moment that a spam > reposted in the list in the body of a message managed to score over 15, > (which I've never seen happen in my time on the list) I think you'll find > bayesian classifiers don't work the way you probably think they do. All > they do is tokenize a message based on word boundries, and build word > statistics based on that. They do *not* learn specific messages or phrases > as spam or ham, they simply count the statistical prevalence of words.
Also -- autolearning as spam requires a minimum of 3 points from both header and body hits. (I think this is in the doco too.) It would be an extremely rare occurrence that a post to SA-talk can accumulate 3 points in hdr hits. I agree with Simon -- IMO it's not a problem. Autolearning has a number of safeguards to avoid autolearning from the wrong mails. --j. ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk