Re: [SAtalk] autolearn/autowhitelist misguided

2003-06-22 Thread Justin Mason
Simon Byrnand said: > > - for spam, must have 3 head hits and 3 body hits > > Why ? This seems a bit arbitrary to me. Either we trust the scoring or we > don't :) What is magic about 3 in particular ? Yeah, not sure myself. if I recall correctly it gave a stronger statistical basis to ensur

Re: [SAtalk] autolearn/autowhitelist misguided

2003-06-22 Thread Simon Byrnand
At 15:20 22/06/03 -0700, Justin Mason wrote: Matt Kettler said: > As for disabling the network checks for auto-learning, that makes sense to > me as well, since the bayes code learns from text tokens, not IPs. Actually, not quite right, if you're scanning with network tests, it'll do the auto-lea

Re: [SAtalk] autolearn/autowhitelist misguided

2003-06-22 Thread Simon Byrnand
At 10:08 22/06/03 -0400, Matt Kettler wrote: At 08:30 PM 6/21/03 -0400, Gordon Cormack wrote: Auto-learn and auto-whitelist use different scoring criteria from those used in spamassassin's spam filtering. The bayes auto-learning does not use "it's own" scoring mechanism, it uses scoreset 0. This i

Re: [SAtalk] autolearn/autowhitelist misguided

2003-06-22 Thread Justin Mason
Matt Kettler said: > As for disabling the network checks for auto-learning, that makes sense to > me as well, since the bayes code learns from text tokens, not IPs. Actually, not quite right, if you're scanning with network tests, it'll do the auto-learn score test with network tests as well.

Re: [SAtalk] autolearn/autowhitelist misguided

2003-06-22 Thread Justin Mason
Gordon Cormack said: > In supervised mode, positive feedback is exactly what you want. > > For the reasons that I've mentioned before, the lack of feedback in the > current setup causes the system to 'learn' progressively less accurate > information. BTW supervised mode is pretty trivial to set

Re: [SAtalk] autolearn/autowhitelist misguided

2003-06-22 Thread Gordon Cormack
On Sun, Jun 22, 2003 at 10:45:42AM -0400, Gordon Cormack wrote: > What I have observed is < 0.2% false > positives and < 1.0% false negatives. I miscomputed, using only the spam count in the denominator. The true numbers [false / (ham+spam)] are: false positives: < 0.05% (counted) false

Re: [SAtalk] autolearn/autowhitelist misguided

2003-06-22 Thread Gordon Cormack
On Sun, Jun 22, 2003 at 10:08:07AM -0400, Matt Kettler wrote: > At 08:30 PM 6/21/03 -0400, Gordon Cormack wrote: > >Auto-learn and auto-whitelist use different scoring criteria from those > >used in spamassassin's spam filtering. > > The bayes auto-learning does not use "it's own" scoring mechanis

Re: [SAtalk] autolearn/autowhitelist misguided

2003-06-22 Thread Matt Kettler
At 08:30 PM 6/21/03 -0400, Gordon Cormack wrote: Auto-learn and auto-whitelist use different scoring criteria from those used in spamassassin's spam filtering. The bayes auto-learning does not use "it's own" scoring mechanism, it uses scoreset 0. This is the score the email would get by the main S

AW: [SAtalk] autolearn/autowhitelist misguided

2003-06-22 Thread Martin Bene
Hi Gordon > The rationale for spamassassin's behaviour is, I think, the fear that > in unsupervised mode it will go off track. Perhaps there should be a user > flag "supervised/unsupervised" that determines whether or not the same > criteria are used for filtering and learning. In "supervised" m

[SAtalk] autolearn/autowhitelist misguided

2003-06-21 Thread Gordon Cormack
Auto-learn and auto-whitelist use different scoring criteria from those used in spamassassin's spam filtering. IMO, this is a serious mistake. In the long run, it means that the bayesian and whitelist algorithms will simply reinforce whatever errors are made by the feature-based classifier. I ha