The biggest problem with -S is due to the ordering of the rule checks.
If all of the negative rules (or at least the _large_ negative rules)
were processed first, it would probably be ok, but right now (or at
least with 2.20) - if you enabled it, the whitelisting would never get
used, since it would reach the threshold prior to the whitelist check.

I don't know if that has been changed, but that made it pretty useless
for me. 

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  [EMAIL PROTECTED]
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216


> -----Original Message-----
> From: Sidney Markowitz [mailto:[EMAIL PROTECTED]] 
> Sent: Thursday, May 02, 2002 1:11 PM
> To: [EMAIL PROTECTED]
> Subject: Re: [SAtalk] AWL verses early-terminate
> 
> 
> On Thu, 2002-05-02 at 09:16, Charlie Watts wrote:
> > It has just occured to me that this will adjust the AWL math because
> > I won't be getting "big" positive numbers into the AWL any more.
> 
> The fact that the -S option is reasonable points out that the 
> scoring is
> not a linear measure of spamminess. The function P(s) of the 
> probability
> that a message with score s is spam stays near 0 until some small
> positive s, then asymptotically approaches 1 somewhere around 
> where you
> want to set the spam threshold. This means that a message 
> with score 20
> and one with score 70 are both certainly spam and should not 
> contribute
> different weights to the AWL calculation. What we really want is some
> measure of the probability that a message from somewhere is spam based
> on our past experience with messages from the same place. 
> That indicates
> that rather than a linear average of the score we should be averaging
> something that approximates the probability of being spam, 
> i.e., convert
> the score into a "spamminess" level that is 0 below some threshold, 1
> above some threshold, and a few values in between for spam scores that
> are not considered by themselves to be certain spam or non-spam. Of
> course the "1" can be something larger so the whole thing can 
> be scaled
> to integers if that seems more aesthetic.
> 
> This gives me another idea: If you consider the AWL as being a way of
> assigning an a priori probability of spamminess to a message based on
> local experience with messages with the same From: header, we can
> generalize that to keep track of experience with messages that are
> similar based on other criteria. Is there a reason not to track any
> other headers, such as the return-path or the first or second received
> header? Would it make sense to have a configurable AWL that tracks
> criteria that are more useful at a local site? A local spam phrase or
> non-spam phrase list?
> 
>  -- sidney
> 
> 
> 
> _______________________________________________________________
> 
> Have big pipes? SourceForge.net is looking for download 
> mirrors. We supply
> the hardware. You get the recognition. Email Us: 
> [EMAIL PROTECTED]
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to