update on floating dividing score between spam and ham messages

Joe Flowers Sun, 10 Jul 2005 05:14:14 -0700

I don't know if this will help anyone or not, but I wanted to reportback just in case.

In early April, I completely unhinged the dividing line between what SAscore is used to mark a message as spam or ham (5.00 = default). Thisallows the system and this dividing line to drift "freely" to anywherethat SA will allow, without bound. This anti-spam setup has workedconsistently much much better the whole time than in any previousimplementation that we have done and with very little maintenance. Weare very happy with it and are looking forward to implementing future SAversions in the same fashion.

I'm not exactly sure the following numbers represent the whole timesince April, but they should be pretty close.

We've had 360,922 spam messages and 396,983 ham messages with anormalized average spam score of 6.8714134 and a normalized average hamscore of -2.1532284. I have the divding line "set" at 30% of thedistance between the average ham score and average spam score (30% abovethe average ham score). So, the dividing line is currently floatingaround 0.55416414.


Apart from the default SA install, the only thing I have changed is
1. Turned off auto-learn <--- I think this is very important.
2. Set SA to ignore our custom spam score tag in the message headers.

We are currently running SA v3.02.

From time to time, but not very often (a couple of times every twoweeks or so), I do feed bayes (sa-learn) with a few messages that aremisplaced. I don't know the stats, but we have very few false positives,so I'm mostly feeding bayes with the false negatives which consist ofthe new/different message tricks that the spammers are using.

Everyone here has been very happy with the results. It's been much muchbetter than any implementation in the past.

Many thanks to the SA developers! Rock on!

Joe

update on floating dividing score between spam and ham messages

Reply via email to