Jeff Mincy wrote:
From: Linda Walsh <sa-u...@tlinx.org>
Date: Wed, 27 May 2009 12:48:43 -0700
Bowie Bailey wrote: > ----
At face value, this seems very counter productive.
You still aren't understanding the wiki or the AWL scoring or what AWL
is trying to do.
----
Ah, but it only seems I'm daft, today...:-)
If I get spam from 1000 senders, they all end up in my
AWL???
yes. every email+ip address pair that sends you email winds up in
your AWL with an average score for that pair. This is ok.
----
GRRR....not so ok in my mindset, but ... and ... errr..
well that only makes it more confusing, in a way...since I was
only 99% certain that I'd never gotten any HAM from hostname
'518501.com' (thinking for a short period that AWL might be classify
things by hosts as reliable or not, instead of, or in addition to
by email-addr), but I'm 99.97% certain I've never gotten any HAM
from user 'paypal.notify' (at) hostname '5185
AWL should only be added to by emails judged to be 'ham' via
the feed back mechanisms --, spammers shouldn't get bonuses for
being repeat senders...
You are getting too attached to the 'whitelist' part of the name.
Pretend AWL stands for average weighting list.
=====
Aw...come on. Isn't the world difficult enough without
changing white to black or white to weighing? I mean, we humans
have enough trouble agreeing on what our symbols, "words" mean in
relation to concepts and all without ya goin' and redefining perfectly
good acceptable symbols to mean something else completely and still
claim it to be some semblance of English. No wonder most of the
non-techno-literate humans on this world regard us techies with
a hint of suspicion regarding the difficulty of problems. We go around
redefining words to suit reality and catch the heat when the rest of
the world doesn't understand our meaning:
Pointy-Haired Boss: "Well, how long did you say it would take?"
Geek: "Well, I said it was 3-4 weeks worth of work."
PHB: "Then why has it been 6 weeks with no product? I told you
anything over 4 weeks was unacceptable!"
G: "6 weeks, but...to get under 4 weeks, I assumed you were talking
168-hour pure-programming time weeks -- not CALENDAR weeks!...."
AWL isn't whitelisting spammers. It is pushing the score to the
average for that sender. The sender can have a high average or a low
average.
---
An average? So it keeps the scores of all the past emails of every email we
ever got sent? Must just store a weighted average -- otherwise
the space (hmm...someone said something about 80MB+ auto-whitelist DB
files?)....
Why not call it the Historically Based Score Normalizer or
HBSN module? Db file could be "historical-norms" or something.
If the previous email from a particular sender was FP or FN then AWL
will have an incorrect average and will wind up doing or trying to do
the wrong thing with subsequent email for that sender.
----
Maybe it shouldn't add in the 'average' unless it exceeds
the 'auto-learning threshold'?? I.e. something like the
'bayes_auto_learn_threshold_nonspam' for HAM and the
'bayes_auto_learn_threshold_spam' for SPAM. Assuming it doesn't
already do such a thing, it would make a little sense...so as
not to train it on 'bad data'...
When I run "sa-learn --spam <email>" over a message, can I
assume (or is it the case) that telling SA, a message was 'spam'
would assign a sufficiently large value to the 'HBSN' value for that
sender to reduce any effect of having falsely (if it is likely to happen)
incorrect value?
Or might I at least assume that each "sa-learn" over a message
will modify it's AWL score appropriately?
You can remove addresses using spamassassin --remove-from-whitelist
----
Yes...saw that after visiting the wiki. Is there a
--show-whitelist-with-current-scores-and-their-weight switch as well
(as opposed to one that only showed the addr's in the white list, or only
showed the non-weighted scores)?
Thanks...and um...
How difficult would it be to have the name of the module reflect
what it's actually doing? maybe roll out a name change with the next
".dot" release of SA? (3.3? 3.4?) Might alleviate some amount of
confusion(?)...
Does the AWL also keep track of when it last saw an 'email' addr
so it can 'expire' the oldest entries so the db doesn't grow to eventually
consume all forms of matter and energy in the universe? :-)
Thanks for the clarification and info!!
-linda