Jason Marshall wrote:
> X-Spam-Status: No, score=2.7 required=5.0 tests=BAYES_60,SARE_MLB_Stock1,
>     TW_AQ autolearn=no version=3.1.0
> X-Spam-Report:
>     *  1.7 SARE_MLB_Stock1 BODY: SARE_MLB_Stock1
>     *  0.1 TW_AQ BODY: Odd Letter Triples with AQ
>     *  1.0 BAYES_60 BODY: Bayesian spam probability is 60 to 80%
>     *      [score: 0.6809]
> 
> To me, that looks more like 2.8 not 2.7 points!  Is this just my site?
> Sorry if someone already brought this up long ago...

Short answer:

one word.. Rounding.

Medium-length answer:

To avoid cluttering the display, SA rounds scores to two digits when displaying
numbers in some places, and truncates in others. This can cause small
differences. Don't worry about it.


Long answer:

The real score of SARE_MLB_Stock1 is not 1.7, it is:

score SARE_MLB_Stock1 1.66

But SA rounds that rule score to 1.7 to save display space. Most SA rules
actually have scores with 3 decimal places. (ie: 1.268)

So the "real" score of this message, when accounting for all digits, is 2.76
points. However, when displaying, SA truncates the total score to 2.7.

There's been lots of arguments about how best to handle this, but really there
is no perfect way to handle it. There is no way to reliably represent a series
of 3 decimal place numbers as 1 decimal place numbers and then have their sum
always be the same as adding all the real numbers and bringing that down to 1
decimal place. No method of rounding or truncation will ever work 100% of the
time for this.

However, the current method of truncating the final scores avoids really
confusing situations like 4.96 rounding up and displaying things like this:

        X-Spam-Status: No, score=5.0 required=5.0


SA used to round everything, but the above case caused so many errant bug
reports that it was changed so the final result is truncated.

As for rule scores, rounding the rule scores is on average more accurate than
truncating. And switching to truncation here, while more consistent, won't
reduce the number of cases where the numbers don't add up, so there's no point
in bothering.

The "real" answer would be to always display 3-decimal place scores, but that's
rather of ugly and creates a cluttered report. However, you'd always be 100%
accurate.




Reply via email to