[EMAIL PROTECTED] writes:

> Can anyone give me any ideas why SA is so inconsistent between different
> releases?  For example I picked a spam to test a new installation of SA
> with.  It had scored over 10 on a previous install.  When the message
> arrived on my new box, it was scored at only 8.4.  I downgraded to 2.40
> and tried it again and again it was over 10 but not as high as it was with
> 2.41.  The test spam is in NANAS:

Looking at a single message (which was, by the way, marked as spam in
both releases) is not a good measure of anything.

The only worthwhile measures are false positive and false negative rates
over a large sample size.  There are various ways to measure those two
attributes (and ways to combine the two into a single number), but our
focus is on improving both from release to release.

A single message's score is liable to change quite a bit if a rule is
deleted, or added, or the GA algorithm is changed.  The scores changed
quite a bit across the 2.4x series because the GA was being improved.
The GA sometimes finds it's way into local maxima/minima (or maybe
that's all that's possible given the search space), so if it manages to
pop out and find a more optimal solution, the scores may change quite a
bit.  Frankly, we don't worry too much about individual messages.  We
test rules on tens of thousands of messages and the GA runs on hundreds
of thousands of messages.  Changes are made when they seem likely to
improve SA in general.  Optimizing for any small set of messages would
destroy SA's overall performance.

> Now I can understand it scoring higher over time as SA's rules get better
> and better at matching spam.  However I really don't understand why a new
> release would score it lower, especially looking at the specific rules
> that were scored lower.  Can anyone shed any light on this?

After a certain point, higher scores don't help much.  But, if those
lower scores reduced false positives by a significant amount, it's
really significant.  Or, if by lowering those scores, we could raise
others, maybe that will catch more spam without more false positives.
The GA optimizes for correctly categorizing messages, not scoring spam
with ever-higher scores.

Again, single message scores are not really important.  Look at overall
spam vs. nonspam accuracy if you want to do any sort of comparison.  And
yes, that means you need to do your comparison using a "real email"
corpus that has been hand-cleaned -- no false positives and no false
negatives.

Dan


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to