> Personally, I think the AWL is in general a fundamentally broken concept,
> however there are people out there who think otherwise. I will
> likely never use the AWL feature of SA in any form of production
environment.
> I see very minimal benefit from it's use, and a very long history of
severe problems
> with the AWL.

While I think the AWL is a good idea in principle, I have to agree that it
has done me more harm than good overall. I've kept it turned on with 2.43,
since it hasn't yet done much harm, but I'm not aware of any major benefit
either. I'll probably turn it off too.

I wonder what can be done to make it do something useful. There appear to be
two basic intentions of the AWL:

1. My friend emails me a few times, then sends something really spammy.
Negative AWL prevents it from being marked spam.
  - The AWL does work for this, but in practice it's quite rare that my
friends send me spammish messages, and when they do the AWL often doesn't
contribute enough to keep them out of my Spam folder.

2. Spammer sends me a few spams, then gets clever and sends a very
legitimate-looking spam. Positive AWL pushes it over the threshold.
  - Seems like a good idea but I have yet to find evidence of it happening.
The chance that a spammer will send a non-spammy message while still using
the same From and IP address is pretty slim.

Unfortunately, what I notice most often is the AWL reducing the score of
actual spam. Usually not enough to cross the threshold, and the major 2.42
problem with that is fixed, but I use multiple thresholds, and I would like
to let spam score as high as possible regardless of the spammer's previous
scores.

Here's a thought for a more useful but less automatic white/blacklist. SA
could watch messages as they come in (as it does now) and *also* watch
messages piped through spamassassin -r to report as spam.  Assuming a user
reliably reported their spam, this would allow for:

1. User has received n below-threshold messages from X and never reported
any of them as spam. Therefore add X to the whitelist (or increase the
whitelist amount with each message not reported.)

2. User has received n below-threshold messages from Y and reported each one
as spam (a false negative). Therefore add Y to the blacklist (or increase
the blacklist amount with each message reported.)

This is very close to the existing AWL, but looking at reported messages
makes it an accurate measure of the whitelisting and blacklisting the user
needs rather than a guess. Additionally, if there was a way to report a
message as non-spam (a false positive) we could add:

3. User has received one or more above-threshold messages from X and
reported them as false positives. Therefore add X to the whitelist.

4. User has received n above-threshold messages from Y and never reported
any of them as false positives. Therefore add Y to the blacklist.

Of course, these two would depend on users perusing their spam and reporting
false positives.

Since users are hopefully going to be reporting messages as spam or nonspam
for Bayesian purposes anyway, it would be pretty easy to hook this up to the
AWL.

--
Michael Moncur  mgm at starlingtech.com  http://www.starlingtech.com/
"If you cannot convince them, confuse them." --Harry S Truman



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to