Chris Conn wrote:
> Hello,
> 
> I have just gotten around to upgrading my 2.64 SA servers to 3.0.3.  I
> have read the FAQ and searched the archives, so if the following
> question has been asked or covered, please push me in the right
> direction and I will be on my way....
> 
> Is there any documentation as to why the BAYES_XX scores have been
> changed, and for what reason?  Previously BAYES_00 was -4.9 and BAYES_99
> was 5.4, and they are now -2.5 and 3.5 respectively.  Just curious as to
> why.
> 

Every major release has all the scores re-generated with a fresh corpus of spam
and nonspam.


http://wiki.apache.org/spamassassin/HowScoresAreAssigned?highlight=%28scores%29


And no, there is absolutely NOTHING simple, nor linear about how the scores are
assigned. It's not anything like a multiple of hit ratio or other trivial system
that would be overall inaccurate.

SA score assignment is done as a gigantic optimal fit of ALL the scores as a
simultaneous system. This means the hit rate one rule affects the score of every
other rule in the entire ruleset.

Think of it as a gigantic balancing act where all the scores are assigned to get
the fewest FPs and FNs in a test of real world email. If one score moves, all
the rest have to adjust to offset the FPs and FNs caused by that change.


A slight shift in the aggressiveness of the default rules for 3.0 is what likely
resulted in reduced scores for bayes. Most of the SURBL lists got very
impressive hit rates, and their high scores deflated the need for high scores in
the bayes rules. To a lesser extent, the drug rules and other added rules had
the same effect.

Reply via email to