On Sun, 2010-10-17 at 17:05 -0700, Jerry Pape wrote: > At some time in the not too distant past, my otherwise reliable SA > system has broken in an odd way. > > This example is characteristic of the problem:
Can't follow. It is broken, because SA itself reports something different from an unrelated, third-party, stranger website? If not, please feel free to explain what changed without pointing to that source. > x-spam-status reads: No, score=3.8 required=4.0 > tests=BAYES_40,HTML_IMAGE_RATIO_02, > HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,RDNS_NONE,URIBL_BLACK > autolearn=no version=3.2.5 > > Assessment of this header at > http://www.futurequest.net/docs/SA/decode/ yields: > BAYES_40 0.000 Bayesian spam probability is 20 to 40% > HTML_IMAGE_RATIO_02 0.550 HTML has a low ratio of text to image area That site uses SA 3.2.x, score set 1, network tests enabled, Bayes disabled, as evidenced by the above two scores and confirmed by the other scores. You clearly use score set 3, both network tests and Bayes enabled. Given there *is* a BAYES_xx rule in there, the site is broken and does not evaluate correctly. No excuse for the site in this case. (It would be different with "no network test hits", which is indistinguishable from being disabled, without the scores.) > Clearly 5.336 does not equal 3.8. Clearly, that site does not know, neither detect correctly your score set used. > My SA is 3.2.5 in a default config except that I have set global score > required to 4.0 with latest updates. Yup, with Bayes enabled, the exact total score is 3.808. What's off-setting all this is, that the Bayes Classifier based on its training believes the mail to be hammy-ish, almost neutral -- while it should, after appropriate training, classify it spammy, raising the overall score. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}