On Mon, 2009-05-04 at 15:49 -0400, Theo Van Dinter wrote:
> 2009/5/4 Karsten Bräckelmann <guent...@rudersport.de>:

> > > Bear in mind that an email that gets a Bayes score of more than one
> > > point can't be autolearned as ham.
> >
> > Nope, this is wrong.
> >
> > The Bayes rules (as well as some other rules) do NOT have any impact on
> > the auto-learning. In fact, the auto-learner even uses a score-set
> > without Bayes, to avoid self-feeding.

> Actually it's not wrong.  The POD just doesn't match the code,
> unfortunately. :(  (feel like opening a bug?)

Yes, I will open a bug and hopefully come up with a POD patch, if this
is the case. Let's get it straight first, though.

> Yes, the different score set is used to avoid any biasing by the Bayes
> system for as to whether or not to autolearn, but there's also a check
> of the Bayes score that was applied (rule score not bayes probability)
> via https://issues.apache.org/SpamAssassin/show_bug.cgi?id=2865.  In

No commit pointer. I'm lazy, Theo, any hints to the actual commit so I
don't have to dig? :)

> short, if the message seems to strongly be ham or spam, don't
> autolearn it the other way and let train-on-error happen if it is
> actually wrong.

According to that description and bug 2865 comment 3, the word
"strongly" seems to be key here. Not checked the code, but comment 3
pretty much translates to an additional constraint, that the real result
(using the Bayes score-set) as seen in the X-Spam headers must agree
with the auto-learner result for the message to be learned.

Now let's get back to the claim I said is wrong (see above).

With the just discussed constraint in mind, it means that a message must
have a resulting total score < 4.0 (default) to be considered for auto-
learning ham. Very well.

This is by no means the same as "not learned ham, if Bayes rule's score
is > 1". Which would prevent a mail that *only* hits a single rule, a
Bayes rule worth a score > 1, though with a total 1.2 > 1, from being
learned. This example however still is considered ham, which is the same
as the auto-learner judges -- and thus *does* get learned as ham.

I guess I stand to my assessment -- the comment is wrong. :)


I'd still consider fixing the POD though, after we got this sorted and
identified what exactly is missing or inaccurate in the docs.

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to