On Thu, 2002-05-02 at 09:16, Charlie Watts wrote:
> It has just occured to me that this will adjust the AWL math because
> I won't be getting "big" positive numbers into the AWL any more.

The fact that the -S option is reasonable points out that the scoring is
not a linear measure of spamminess. The function P(s) of the probability
that a message with score s is spam stays near 0 until some small
positive s, then asymptotically approaches 1 somewhere around where you
want to set the spam threshold. This means that a message with score 20
and one with score 70 are both certainly spam and should not contribute
different weights to the AWL calculation. What we really want is some
measure of the probability that a message from somewhere is spam based
on our past experience with messages from the same place. That indicates
that rather than a linear average of the score we should be averaging
something that approximates the probability of being spam, i.e., convert
the score into a "spamminess" level that is 0 below some threshold, 1
above some threshold, and a few values in between for spam scores that
are not considered by themselves to be certain spam or non-spam. Of
course the "1" can be something larger so the whole thing can be scaled
to integers if that seems more aesthetic.

This gives me another idea: If you consider the AWL as being a way of
assigning an a priori probability of spamminess to a message based on
local experience with messages with the same From: header, we can
generalize that to keep track of experience with messages that are
similar based on other criteria. Is there a reason not to track any
other headers, such as the return-path or the first or second received
header? Would it make sense to have a configurable AWL that tracks
criteria that are more useful at a local site? A local spam phrase or
non-spam phrase list?

 -- sidney



_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to