James R. Van Zandt said:> Matt Kettler <[EMAIL PROTECTED]> writes: > > > Ahh, you are deceived by truncation. They actually can match one > > nonspam and still be 0.000% because the nonspam corpus is > 100k > > messages :) > > I think that's a bug. The output precision should be increased.
Probably a good idea for marginal cases. > > If a given rule has 1 misplaced nonspam, it will outweigh 99 correctly > > placed spam mails matching THAT RULE. Note that's not 1% of the > > corpus, that's 1% of the overall for the rule. > > I suggest cross-validation: Run the GA using half the corpus. Using > that set of scores, check the other half of the corpus. Examine the > FPs and FNs for misplacement. Repeat starting with the other half of > the corpus. We do. ;) --j. ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk