I don't know if this will help anyone or not, but I wanted to report back just in case.

In early April, I completely unhinged the dividing line between what SA score is used to mark a message as spam or ham (5.00 = default). This allows the system and this dividing line to drift "freely" to anywhere that SA will allow, without bound. This anti-spam setup has worked consistently much much better the whole time than in any previous implementation that we have done and with very little maintenance. We are very happy with it and are looking forward to implementing future SA versions in the same fashion.

I'm not exactly sure the following numbers represent the whole time since April, but they should be pretty close.

We've had 360,922 spam messages and 396,983 ham messages with a normalized average spam score of 6.8714134 and a normalized average ham score of -2.1532284. I have the divding line "set" at 30% of the distance between the average ham score and average spam score (30% above the average ham score). So, the dividing line is currently floating around 0.55416414.

Apart from the default SA install, the only thing I have changed is
1. Turned off auto-learn <--- I think this is very important.
2. Set SA to ignore our custom spam score tag in the message headers.

We are currently running SA v3.02.

From time to time, but not very often (a couple of times every two weeks or so), I do feed bayes (sa-learn) with a few messages that are misplaced. I don't know the stats, but we have very few false positives, so I'm mostly feeding bayes with the false negatives which consist of the new/different message tricks that the spammers are using.

Everyone here has been very happy with the results. It's been much much better than any implementation in the past.
Many thanks to the SA developers! Rock on!

Joe

Reply via email to