I don't know if this will help anyone or not, but I wanted to report
back just in case.
In early April, I completely unhinged the dividing line between what SA
score is used to mark a message as spam or ham (5.00 = default). This
allows the system and this dividing line to drift "freely" to anywhere
that SA will allow, without bound. This anti-spam setup has worked
consistently much much better the whole time than in any previous
implementation that we have done and with very little maintenance. We
are very happy with it and are looking forward to implementing future SA
versions in the same fashion.
I'm not exactly sure the following numbers represent the whole time
since April, but they should be pretty close.
We've had 360,922 spam messages and 396,983 ham messages with a
normalized average spam score of 6.8714134 and a normalized average ham
score of -2.1532284. I have the divding line "set" at 30% of the
distance between the average ham score and average spam score (30% above
the average ham score). So, the dividing line is currently floating
around 0.55416414.
Apart from the default SA install, the only thing I have changed is
1. Turned off auto-learn <--- I think this is very important.
2. Set SA to ignore our custom spam score tag in the message headers.
We are currently running SA v3.02.
From time to time, but not very often (a couple of times every two
weeks or so), I do feed bayes (sa-learn) with a few messages that are
misplaced. I don't know the stats, but we have very few false positives,
so I'm mostly feeding bayes with the false negatives which consist of
the new/different message tricks that the spammers are using.
Everyone here has been very happy with the results. It's been much much
better than any implementation in the past.
Many thanks to the SA developers! Rock on!
Joe