Greg Ward wrote:

> Can anyone give real-world results for AWL in SA 2.1 yet?

Well, since I'm clever-sounding, here's my take: It's wayyyy better than 2.0x, 
but not yet ideal.  In the following discussion, I'll call the original 
(2.0x) AWL AWL1, and the new one AWL2.  The problems come in a few situations:

1. Border cases where a frequent spammer sends messages which score right around
the 5.0 mark, sometimes over, sometimes under.  AWL2 doesn't deal well with
this, where AWL1 would deal with it quite nicely.  Under AWL2, the long-term
average will stay at say 5.0, and so any message which comes in scoring <5.0 
will stay <5.0, and any message >5.0 will stay >5.0 -- under AWL1, after 3 
messages >5.0 everything would get a big bonus.

2. Single message from infrequent correspondant scores very high or very low.  
Let's say I send you a message which for some reason gets a -100 bonus (badly 
constructed whitelist_from or something).  Ok, now I'm in the AWL2 db as 
(-100,1).  I now send you 10 more messages, each scores +10.  I'm now (0,11).  I 
now send you another message of score +8 -- AWL2 will shrink this to 4 (since 
my long term average is 0).

3. Shrinkage is not based on amount of data.  If I've received one message from 
you, score +2, then your average under AWL2 is 2.0, same as if I've received 100 
messages and your average was 2.0 -- it would make radically more sense for the 
shrinkage to be based on the uncertainty in the estimate of your long term 
average -- in other words if you've sent only 1 message, barely shrink the score 
for this message.  If I've seen 100 messages from you, barely consider the score 
of this particular message, and pretty much go with just the long term average.

#3 is easy to solve with some simple statistics (which, though simple, are too 
complex for this late at night -- I'll deal with it soon though)

#2 will basically more or less be solved by fixing #3

#1 is a little trickier.  What we really want is for scores of messages which 
exceed the threshold to carry more weight in the averaging process.  Possibly 
the thing to do here is track the average scores and also the average %age of 
messages which are spam.  If the %age spam exceeds a certain level, then assign 
some bonus points to the score for this message.  Trouble is you could get stuck 
in a nasty AWL-spiral-of-death where once blacklisted by AWL you can never get 
clean.  I'll need to do some more thinking here.

Overall, I'd say AWL2 should definitely be ON in production systems if 
practical.  I believe strongly it will considerably reduce false-positives, and 
will only marginally increase false-negatives (actually, it might not on 
balance, since it will help reduce false positives somewhat through it's 
magical auto-blacklisting), anyway, it might hurt false-negative rates, but can 
only help false-positive rates.  I would particularly recommend it in situations 
where you're likely to encounter false-positives on messages from people with 
whom you've swapped mails before (realtor's office discussing mortgage rates 
anyone?  Chances are you've swapped a few emails with the realtor before you 
start talking mortgage rates); these will mostly be intra-office situations 
where you're getting attachments from people you know, situations where you're 
receiving order confirmations from vendors you frequently use, etc, etc, etc.

C


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to