The right way is actually to have the AWL form a prediction based on a more
sophisticated predictive model, including a zero-frequency estimate for senders
who are not in the whitelist.  The weight provided by the AWL in cases where
there is a-priori data about a sender should depend on the number of messages
sent so far, instead of using the a-priori mean as 50% of the final score, the
%age should shift over time.  There are well known ways of doing this optimally,
but my reference book on the subject is up in Sausalito, and I'm down here in
Menlo Park, 60-odd miles away.  I suppose I could go look things up online...
If only I hadn't smoked all that pot as a youngster, my memory might be good
enough now to do it w/out reference books :)  Or if I were less mathematically
lazy, I could probably derive it from stuff I do remember.

C

Daniel Quinlan wrote:

DQ> Theo Van Dinter <[EMAIL PROTECTED]> writes:
DQ>
DQ> > Well, SA does that by (default) adding a -100 points to the message score.
DQ> > So this spam, listed as from "concord.net" in the header gets -100,
DQ> > then the actual spam scores brought it up to -67.
DQ>
DQ> Yuck.  How about moderating the whitelist modification by the
DQ> pre-whitelist score?  For example, divide the AWL by (score/5).
DQ>
DQ> So, since this message had an AWL of -100 and a pre-AWL score of 33.
DQ>
DQ>  awl   = -100/(33/5)
DQ>  awl   = -15
DQ>
DQ>  final = 33 - 15
DQ>  final = 18
DQ>
DQ> The right way would probably be to search for AWL failures (really
DQ> spammy mail gets through because of AWL) and determine a formula to
DQ> eliminate those without any additional false positives.


_______________________________________________________________
Hundreds of nodes, one monster rendering program.
Now that's a super model! Visit http://clustering.foundries.sf.net/

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to