Re: AWL - BAYES_99/ general questions

Karsten Bräckelmann Thu, 28 Feb 2008 08:02:27 -0800

On Thu, 2008-02-28 at 10:28 -0500, Randy Ramsdell wrote:
> Karsten Bräckelmann wrote:


> > AWL is a score averager. SA has seen that sender before.
> >   http://wiki.apache.org/spamassassin/AutoWhitelist
> >
> > Run it through SA again, and you will see the AWL score getting closer
> > to 0, since the score without AWL is constant. The AWL score is
> > negative, because previous scores have been lower.
> 
> I understand that  AWL is averaging what it has seen before and it must 
> have seen the message as ham, 

No. :)  AWL does not know the concept of spam or ham, it does not know
about your required_score spam threshold. It merely knows about the
previous scores.

> but why would one have to sa-learn the message as spam multiple times.

You do NOT have to, and I didn't say so. :)  AWL keeps track of all
*seen* messages, as opposed to learned ones. Given the initial score of
the message, it has not been learned automatically.

To observe the AWL score it is sufficient, as I said, to run the message
through spamassassin -- this does not require sa-learn. Note that my
comment regarding this was intended to demonstrate AWL, so you can see
for yourself. I did not mean to imply you have to do it regularly. Just
this one time, so you can see how AWL behaves...


Also please note, that AWL in fact keeps track of a pair of sender and
IP address (space). IMHO, this kind of explains the confusing naming,
namely the "whitelist" part. It is most useful for legit senders -- if
they send a single spammy message once, AWL is there for rescue and
lower the score drastically.

The general spam on the other hand is really unlikely to ever be sent a
second time From: the same forged sender address and the same origina-
ting network. Odds are, this particular AWL entry will never ever be
used again with new incoming spam.


> This also means that a system wide 
> approach to improving our SPAM effectiveness requires me parse the AWL 
> score after sa-learning the message to determine if I need to run it 
> again. This would a monumental task and very resource intensive. 

No. See above. Also please note, that Bayes (which you train using
sa-learn) and AWL are entirely unrelated. (Bayes is a token-based
mechanism, about "words" in the message, and does not know about the
concept of email addresses, let alone sender.)


> Wouldn't a better approach be to set AWL to max positive  if I manually 
> learn the message as spam? Or is there a way to modify the DB to correct 
> the previous AWL hits on this message?

Again, see above. If you never will get spam forged to come from that
sender, it won't make a difference. Also, again, Bayes and AWL are
unrelated.

Besides, the A stands for Automatic. No need to correct anything. ;)

If you ever need to clear an AWL score (usually, because the learned
average for a *legit* sender is too high), if at all, you can do so
using 'spambuttbuttin'. Not sa-learn. See 'man spambuttbuttin-run'. [1]

  guenther


[1] See another recent post by Justin. ;-)

-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: AWL - BAYES_99/ general questions

Reply via email to