On Tue, 03 Mar 2009 18:14:48 +0100 Karsten Bräckelmann <guent...@rudersport.de> wrote:
> About 98-99% of my spam in-stream scores as high, that any such > proposal results in a useless increase of the score. > > The problem lies with the LOW scoring spam. Alas, these do not tend to > trigger on a solid subset or meta as you proposed. In particular, RBL > hits are quite rare, even more so for multiple hits. The few rules hit > by low scorers are quite diverse, which complicates this. I think this is a good point. And it probably would have more of an effect on what hits a second threshold. It's not obvious whether this is good or bad. One of the thing I like best about Spamassassin is that its linear scoring system lends itself well to identifying which spams are caught beyond reasonable doubt, leaving the user to concentrate on the lower scoring stuff. Statistical filters can be a Pyrrhic victory if you have to look through 1000 spams to find FPs. I wouldn't want to see the the whole scoring system become statistical, but maybe there could be something that learns the characteristics of FPs and FNs and adds relatively small tweaks to the existing linear score.