-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
btw, I was just rereading this -- an interesting approach you might want to experiment with, is having *two* boundaries. ie: negative scores positive scores <----------------|-------------------|--------------------------> | | ham............ | .....unsure...... | ............spam if a mail scores <= ham threshold, it's ham; >= spam threshold, it's spam; and > ham threshold and < spam threshold, it's "unsure". this is similar to the SpamBayes UI. - --j. Joe Flowers writes: > I don't know if this will help anyone or not, but I wanted to report > back just in case. > > In early April, I completely unhinged the dividing line between what SA > score is used to mark a message as spam or ham (5.00 = default). This > allows the system and this dividing line to drift "freely" to anywhere > that SA will allow, without bound. This anti-spam setup has worked > consistently much much better the whole time than in any previous > implementation that we have done and with very little maintenance. We > are very happy with it and are looking forward to implementing future SA > versions in the same fashion. > > I'm not exactly sure the following numbers represent the whole time > since April, but they should be pretty close. > > We've had 360,922 spam messages and 396,983 ham messages with a > normalized average spam score of 6.8714134 and a normalized average ham > score of -2.1532284. I have the divding line "set" at 30% of the > distance between the average ham score and average spam score (30% above > the average ham score). So, the dividing line is currently floating > around 0.55416414. > > Apart from the default SA install, the only thing I have changed is > 1. Turned off auto-learn <--- I think this is very important. > 2. Set SA to ignore our custom spam score tag in the message headers. > > We are currently running SA v3.02. > > From time to time, but not very often (a couple of times every two > weeks or so), I do feed bayes (sa-learn) with a few messages that are > misplaced. I don't know the stats, but we have very few false positives, > so I'm mostly feeding bayes with the false negatives which consist of > the new/different message tricks that the spammers are using. > > Everyone here has been very happy with the results. It's been much much > better than any implementation in the past. > Many thanks to the SA developers! Rock on! > > Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFC3FC5MJF5cimLx9ARAnWnAJ0Up+/8hC00748EPiGO2fk5p7c4IACeMWXr JgKnIDrK1LkPPzsne+7N+SA= =3I84 -----END PGP SIGNATURE-----