I just pushed out the new scores (and a bugfix or two) as 2.11

The new scores are done by constraining the GA more, using Michael Moncur's 
submitted scores as a starting point, and then hand-tweaking the output where 
basically any -ve scores that came out but which only existed in the corpus as 
spam (or in the nonspam corpus as highly dubious nonspam) were reset to 
something small and +ve

The scores are also now inside much tighter bounds than before.  Only a handful 
of scores are now >5.0, and nothing >6.8; the false positive rate has plummeted.  
>From an analysis of the false-negatives, it looks like tightening up the 
LINE_OF_YELLING rule a lot would identify a few thousand of the current 
false-negatives.  Or else identifying something in the spams which are 
LINE_OF_YELLING which are not in the nonspams which match that rule.  Some 2,120 
3500 odd false-negatives matched the LINE_OF_YELLING rule, but it's not worth 
enough points to push them over the line.  In fact, manually checking shows that 
1867 of those messages matched *only* LINE_OF_YELLING.

Go ahead and read through the new scores, and let me know what you're unhappy 
about now :)  I realize that things are still not perfect, but I think they're a 
lot better than 2.1 was.

C


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to