I just pushed out the new scores (and a bugfix or two) as 2.11 The new scores are done by constraining the GA more, using Michael Moncur's submitted scores as a starting point, and then hand-tweaking the output where basically any -ve scores that came out but which only existed in the corpus as spam (or in the nonspam corpus as highly dubious nonspam) were reset to something small and +ve
The scores are also now inside much tighter bounds than before. Only a handful of scores are now >5.0, and nothing >6.8; the false positive rate has plummeted. >From an analysis of the false-negatives, it looks like tightening up the LINE_OF_YELLING rule a lot would identify a few thousand of the current false-negatives. Or else identifying something in the spams which are LINE_OF_YELLING which are not in the nonspams which match that rule. Some 2,120 3500 odd false-negatives matched the LINE_OF_YELLING rule, but it's not worth enough points to push them over the line. In fact, manually checking shows that 1867 of those messages matched *only* LINE_OF_YELLING. Go ahead and read through the new scores, and let me know what you're unhappy about now :) I realize that things are still not perfect, but I think they're a lot better than 2.1 was. C _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk