Chris Conn wrote: > Hello, > > I have just gotten around to upgrading my 2.64 SA servers to 3.0.3. I > have read the FAQ and searched the archives, so if the following > question has been asked or covered, please push me in the right > direction and I will be on my way.... > > Is there any documentation as to why the BAYES_XX scores have been > changed, and for what reason? Previously BAYES_00 was -4.9 and BAYES_99 > was 5.4, and they are now -2.5 and 3.5 respectively. Just curious as to > why. >
Every major release has all the scores re-generated with a fresh corpus of spam and nonspam. http://wiki.apache.org/spamassassin/HowScoresAreAssigned?highlight=%28scores%29 And no, there is absolutely NOTHING simple, nor linear about how the scores are assigned. It's not anything like a multiple of hit ratio or other trivial system that would be overall inaccurate. SA score assignment is done as a gigantic optimal fit of ALL the scores as a simultaneous system. This means the hit rate one rule affects the score of every other rule in the entire ruleset. Think of it as a gigantic balancing act where all the scores are assigned to get the fewest FPs and FNs in a test of real world email. If one score moves, all the rest have to adjust to offset the FPs and FNs caused by that change. A slight shift in the aggressiveness of the default rules for 3.0 is what likely resulted in reduced scores for bayes. Most of the SURBL lists got very impressive hit rates, and their high scores deflated the need for high scores in the bayes rules. To a lesser extent, the drug rules and other added rules had the same effect.