On Thu, 2002-05-02 at 09:16, Charlie Watts wrote: > It has just occured to me that this will adjust the AWL math because > I won't be getting "big" positive numbers into the AWL any more.
The fact that the -S option is reasonable points out that the scoring is not a linear measure of spamminess. The function P(s) of the probability that a message with score s is spam stays near 0 until some small positive s, then asymptotically approaches 1 somewhere around where you want to set the spam threshold. This means that a message with score 20 and one with score 70 are both certainly spam and should not contribute different weights to the AWL calculation. What we really want is some measure of the probability that a message from somewhere is spam based on our past experience with messages from the same place. That indicates that rather than a linear average of the score we should be averaging something that approximates the probability of being spam, i.e., convert the score into a "spamminess" level that is 0 below some threshold, 1 above some threshold, and a few values in between for spam scores that are not considered by themselves to be certain spam or non-spam. Of course the "1" can be something larger so the whole thing can be scaled to integers if that seems more aesthetic. This gives me another idea: If you consider the AWL as being a way of assigning an a priori probability of spamminess to a message based on local experience with messages with the same From: header, we can generalize that to keep track of experience with messages that are similar based on other criteria. Is there a reason not to track any other headers, such as the return-path or the first or second received header? Would it make sense to have a configurable AWL that tracks criteria that are more useful at a local site? A local spam phrase or non-spam phrase list? -- sidney _______________________________________________________________ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk