[EMAIL PROTECTED] writes: > Can anyone give me any ideas why SA is so inconsistent between different > releases? For example I picked a spam to test a new installation of SA > with. It had scored over 10 on a previous install. When the message > arrived on my new box, it was scored at only 8.4. I downgraded to 2.40 > and tried it again and again it was over 10 but not as high as it was with > 2.41. The test spam is in NANAS:
Looking at a single message (which was, by the way, marked as spam in both releases) is not a good measure of anything. The only worthwhile measures are false positive and false negative rates over a large sample size. There are various ways to measure those two attributes (and ways to combine the two into a single number), but our focus is on improving both from release to release. A single message's score is liable to change quite a bit if a rule is deleted, or added, or the GA algorithm is changed. The scores changed quite a bit across the 2.4x series because the GA was being improved. The GA sometimes finds it's way into local maxima/minima (or maybe that's all that's possible given the search space), so if it manages to pop out and find a more optimal solution, the scores may change quite a bit. Frankly, we don't worry too much about individual messages. We test rules on tens of thousands of messages and the GA runs on hundreds of thousands of messages. Changes are made when they seem likely to improve SA in general. Optimizing for any small set of messages would destroy SA's overall performance. > Now I can understand it scoring higher over time as SA's rules get better > and better at matching spam. However I really don't understand why a new > release would score it lower, especially looking at the specific rules > that were scored lower. Can anyone shed any light on this? After a certain point, higher scores don't help much. But, if those lower scores reduced false positives by a significant amount, it's really significant. Or, if by lowering those scores, we could raise others, maybe that will catch more spam without more false positives. The GA optimizes for correctly categorizing messages, not scoring spam with ever-higher scores. Again, single message scores are not really important. Look at overall spam vs. nonspam accuracy if you want to do any sort of comparison. And yes, that means you need to do your comparison using a "real email" corpus that has been hand-cleaned -- no false positives and no false negatives. Dan ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk