On Tue, Sep 02, 2003 at 01:02:47PM +1200, Simon Byrnand wrote:
> At 02:24 2/09/2003 +0200, Carlo Wood wrote:
> >On Tue, Sep 02, 2003 at 09:56:54AM +1200, Simon Byrnand wrote:
> I don't whitelist this mailing list and I know at least one of the
> developers (Justin) doesn't either and I don't have problems. I very rarely
> see a message from this list with a sample spam in it which scores above my
> threshold of 5, and I've *never* seen one high enough to be autolearnt. (>
> 15 not counting bayes)
>
> >Learning mails that *discuss* spam as being spam will make the Bayesian
> >classifier less accurate.
>
> Are you quite sure about that ? Lets assume for a moment that a spam
> reposted in the list in the body of a message managed to score over 15,
> (which I've never seen happen in my time on the list) I think you'll find
> bayesian classifiers don't work the way you probably think they do. All
> they do is tokenize a message based on word boundries, and build word
> statistics based on that. They do *not* learn specific messages or phrases
> as spam or ham, they simply count the statistical prevalence of words.
Also -- autolearning as spam requires a minimum of 3 points from both header and
body hits. (I think this is in the doco too.)
Oh, I forgot all about that 3 points header 3 points body requirement for autolearning as spam too, good point.
I havn't seen that in the docs, I remember you telling me once before... thats not to say its NOT in the docs, I just havn't noticed it. (or looked for it :)
It would be an extremely rare occurrence that a post to SA-talk can accumulate 3 points in hdr hits.
I agree with Simon -- IMO it's not a problem. Autolearning has a number of safeguards to avoid autolearning from the wrong mails.
Which is just as well, because I and many other use sitewide bayes autolearn with little or no manual training or intervention, and I can't stop my users from subscribing to this list, or others that discuss spam ;-)
Regards, Simon
------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk