On Thu, 30 Apr 2009, LuKreme wrote:
First off, I suppose that if you get real mail from someone who has only
ever been seen as a spam sender, then yes, the first mail would be
penalized. But is this ever the case?
(nod) Any time someone's address has been used as a spoofed sender before
that legitimate sender makes first contact with a new correspondent. But
as I understand your logic, there is no 'rule' to distinguish the 'first'
AWL entry as 'special' from all the rest... just that 'others' exist...
Let's lay out the logic here:
2 AWL is positive or does not exist
a Check for other AWL entries using same address but different hosts.
i If there is an AWL with a negative score, then multiply by -0.2 and
add to score
So any AWL with a negative score still helps the new mail be negative?
The sender's legit mail helps new spam?
ii If there is an AWL with a positive score, under 5.0, then multiply by
0.1 and add
iii If there is an AWL with a positive score over 5.0, then multiply it
by 0.4 and add
So in the unlikely event that spam (from a different server) precedes
legitimate mail, the legit sender gets a postitive adjustment before
they have a chance to score negative...
Note that this logic will also be problematic when sender has multiple
mail servers. Many senders get a few points positive...
c if total amount added is over some threshold, normalize on that threshold
(3 points? 5? 8?)
Now let's presume that the sender is spoofed by spammers on ten different
IP's, producing ten different AWL entries. How will you distinguish the
legit sender's IP (except by hoping they have scored negative?)... You
will simply add up ALL the IP AWL's and score *any* mail from the sender
with a significant positive adjustment....
3 AWL is negative
{ crickets }
But how often does that really happen? As I said, most people get a *few*
points on legit mail. The idea being that an average score of 0.8 will
'average' with a fluke spammy mail and keep the score lower.... But your
way is adding those small scores to essentially ALL mail unless the lucky
sender never mentioned viag.... ooops. There goes *my* score.... LOL
Maybe it makes sense to only do this check if the message has at least
scored positive?
Again, a significant proportion of ham gets a few points.
So yes, if b...@example.com has never emailed me except for a bunch of
spam, then yeah, the message is going to get bumped up in its score, but
how often does that happen? Does that ever happen?
Happens for me all the time. I get dictionary spam with a random client's
address as sender, and then I get an inquiry from the client about all
these 'bounces' they are receiving. Naturally, they quote the bounce,
which includes some spam sign, and the client is off to a good start with a
moderately spammy mail to me. (smile)
But bob could also e-mail you three or four times, getting a small
positive score, then you get spammed "from Bob" with high scores from a
botnet (and I usually get several copies of a spam like that), and the
next time bob e-mails, he gets logic 2.a.ii spplied above for each and
every AWL for his address. Could be hefty....
Also, lets say b...@example.com sends a message after a bunch of spams
have been sent, and say that message scores -1.0, plus an AWL adjustment
of 5.0 based on the above.
I'm sure there are some people who *would* 'fit your model' and have
negative scores on their legit mail and not be hurt by the proposed rule.
But there would be too many with positive scores that would be hurt....
The point is (as it seems to me) that people who send mail from
'accou...@bankofamerica.com' from their botnets will very quickly scale up
the AWL modification to the maximal threshold.
And the people who get legit mail from bank of america will also very
quickly scale up - I doubt BoA mail scores negative. :)
This all assumes that the server that is checked is the last non-local
server (that is, the first one listed in the headers in typical order)
Which, for any yahoo mailing list will be a different server many times.
And so if your yahoo list scores slightly positive, all those different
yahoo servers will all add to the score. Ditto hotmail, gmail, etc.
I can see what you *want* to do. I just don't see a practical way to do
it.
Though I'm toying with a few ideas... I'll start a separate thread.
Thanks.
- Charles