Re: Seeking advice re: SA score discrepancies

Karsten Bräckelmann Sun, 17 Oct 2010 18:28:50 -0700

On Sun, 2010-10-17 at 17:05 -0700, Jerry Pape wrote:
> At some time in the not too distant past, my otherwise reliable SA
> system has broken in an odd way.
> 
> This example is characteristic of the problem:


Can't follow. It is broken, because SA itself reports something
different from an unrelated, third-party, stranger website?

If not, please feel free to explain what changed without pointing to
that source.

> x-spam-status reads: No, score=3.8 required=4.0
> tests=BAYES_40,HTML_IMAGE_RATIO_02,
> HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,RDNS_NONE,URIBL_BLACK    
> autolearn=no version=3.2.5
> 
> Assessment of this header at
> http://www.futurequest.net/docs/SA/decode/ yields:

> BAYES_40             0.000  Bayesian spam probability is 20 to 40%
> HTML_IMAGE_RATIO_02  0.550  HTML has a low ratio of text to image area

That site uses SA 3.2.x, score set 1, network tests enabled, Bayes
disabled, as evidenced by the above two scores and confirmed by the
other scores. You clearly use score set 3, both network tests and Bayes
enabled.

Given there *is* a BAYES_xx rule in there, the site is broken and does
not evaluate correctly. No excuse for the site in this case. (It would
be different with "no network test hits", which is indistinguishable
from being disabled, without the scores.)


> Clearly 5.336 does not equal 3.8.

Clearly, that site does not know, neither detect correctly your score
set used.

> My SA is 3.2.5 in a default config except that I have set global score
> required to 4.0 with latest updates.

Yup, with Bayes enabled, the exact total score is 3.808.

What's off-setting all this is, that the Bayes Classifier based on its
training believes the mail to be hammy-ish, almost neutral -- while it
should, after appropriate training, classify it spammy, raising the
overall score.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Seeking advice re: SA score discrepancies

Reply via email to