At 06:02 2/09/2003 +0100, Justin Mason wrote:
On Tue, Sep 02, 2003 at 01:02:47PM +1200, Simon Byrnand wrote:
> At 02:24 2/09/2003 +0200, Carlo Wood wrote:
> >On Tue, Sep 02, 2003 at 09:56:54AM +1200, Simon Byrnand wrote:
> I don't whitelist this mailing list and I know at least one of the
> developers (Justin) doesn't either and I don't have problems. I very rarely
> see a message from this list with a sample spam in it which scores above my
> threshold of 5, and I've *never* seen one high enough to be autolearnt. (>
> 15 not counting bayes)
>
> >Learning mails that *discuss* spam as being spam will make the Bayesian
> >classifier less accurate.
>
> Are you quite sure about that ? Lets assume for a moment that a spam
> reposted in the list in the body of a message managed to score over 15,
> (which I've never seen happen in my time on the list) I think you'll find
> bayesian classifiers don't work the way you probably think they do. All
> they do is tokenize a message based on word boundries, and build word
> statistics based on that. They do *not* learn specific messages or phrases
> as spam or ham, they simply count the statistical prevalence of words.


Also -- autolearning as spam requires a minimum of 3 points from both header and
body hits. (I think this is in the doco too.)

Oh, I forgot all about that 3 points header 3 points body requirement for autolearning as spam too, good point.


I havn't seen that in the docs, I remember you telling me once before... thats not to say its NOT in the docs, I just havn't noticed it. (or looked for it :)

It would be an extremely rare occurrence that a post to SA-talk can accumulate
3 points in hdr hits.

I agree with Simon -- IMO it's not a problem.   Autolearning has a number of
safeguards to avoid autolearning from the wrong mails.

Which is just as well, because I and many other use sitewide bayes autolearn with little or no manual training or intervention, and I can't stop my users from subscribing to this list, or others that discuss spam ;-)


Regards,
Simon



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to