Re: update on floating dividing score between spam and ham messages

Justin Mason Mon, 18 Jul 2005 18:02:48 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


btw, I was just rereading this -- an interesting approach you might
want to experiment with, is having *two* boundaries.  ie:

negative scores                                          positive scores
  <----------------|-------------------|-------------------------->
                   |                   |
  ham............  | .....unsure...... | ............spam
  

if a mail scores <= ham threshold, it's ham; >= spam threshold, it's spam;
and > ham threshold and < spam threshold, it's "unsure".  this is similar
to the SpamBayes UI.

- --j.

Joe Flowers writes:
> I don't know if this will help anyone or not, but I wanted to report 
> back just in case.
> 
> In early April, I completely unhinged the dividing line between what SA 
> score is used to mark a message as spam or ham (5.00 = default). This 
> allows the system and this dividing line to drift "freely" to anywhere 
> that SA will allow, without bound. This anti-spam setup has worked 
> consistently much much better the whole time than in any previous 
> implementation that we have done and with very little maintenance. We 
> are very happy with it and are looking forward to implementing future SA 
> versions in the same fashion.
> 
> I'm not exactly sure the following numbers represent the whole time 
> since April, but they should be pretty close.
> 
> We've had 360,922 spam messages and 396,983 ham messages with a 
> normalized average spam score of 6.8714134 and a normalized average ham 
> score of -2.1532284.  I have the divding line "set" at 30% of the 
> distance between the average ham score and average spam score (30% above 
> the average ham score). So, the dividing line is currently floating 
> around 0.55416414.
> 
> Apart from the default SA install, the only thing I have changed is
> 1. Turned off auto-learn <--- I think this is very important.
> 2. Set SA to ignore our custom spam score tag in the message headers.
> 
> We are currently running SA v3.02.
> 
>  From time to time, but not very often (a couple of times every two 
> weeks or so), I do feed bayes (sa-learn) with a few messages that are 
> misplaced. I don't know the stats, but we have very few false positives, 
> so I'm mostly feeding bayes with the false negatives which consist of 
> the new/different message tricks that the spammers are using.
> 
> Everyone here has been very happy with the results. It's been much much 
> better than any implementation in the past.
> Many thanks to the SA developers! Rock on!
> 
> Joe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFC3FC5MJF5cimLx9ARAnWnAJ0Up+/8hC00748EPiGO2fk5p7c4IACeMWXr
JgKnIDrK1LkPPzsne+7N+SA=
=3I84
-----END PGP SIGNATURE-----

Re: update on floating dividing score between spam and ham messages

Reply via email to