On Mon, 2009-05-04 at 15:49 -0400, Theo Van Dinter wrote: > 2009/5/4 Karsten Bräckelmann <guent...@rudersport.de>:
> > > Bear in mind that an email that gets a Bayes score of more than one > > > point can't be autolearned as ham. > > > > Nope, this is wrong. > > > > The Bayes rules (as well as some other rules) do NOT have any impact on > > the auto-learning. In fact, the auto-learner even uses a score-set > > without Bayes, to avoid self-feeding. > Actually it's not wrong. The POD just doesn't match the code, > unfortunately. :( (feel like opening a bug?) Yes, I will open a bug and hopefully come up with a POD patch, if this is the case. Let's get it straight first, though. > Yes, the different score set is used to avoid any biasing by the Bayes > system for as to whether or not to autolearn, but there's also a check > of the Bayes score that was applied (rule score not bayes probability) > via https://issues.apache.org/SpamAssassin/show_bug.cgi?id=2865. In No commit pointer. I'm lazy, Theo, any hints to the actual commit so I don't have to dig? :) > short, if the message seems to strongly be ham or spam, don't > autolearn it the other way and let train-on-error happen if it is > actually wrong. According to that description and bug 2865 comment 3, the word "strongly" seems to be key here. Not checked the code, but comment 3 pretty much translates to an additional constraint, that the real result (using the Bayes score-set) as seen in the X-Spam headers must agree with the auto-learner result for the message to be learned. Now let's get back to the claim I said is wrong (see above). With the just discussed constraint in mind, it means that a message must have a resulting total score < 4.0 (default) to be considered for auto- learning ham. Very well. This is by no means the same as "not learned ham, if Bayes rule's score is > 1". Which would prevent a mail that *only* hits a single rule, a Bayes rule worth a score > 1, though with a total 1.2 > 1, from being learned. This example however still is considered ham, which is the same as the auto-learner judges -- and thus *does* get learned as ham. I guess I stand to my assessment -- the comment is wrong. :) I'd still consider fixing the POD though, after we got this sorted and identified what exactly is missing or inaccurate in the docs. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}