Re: [SAtalk] large numbers of tiny scores = SPAM!

Rob Winters Wed, 29 May 2002 13:51:22 -0700

At 03:23 PM 5/29/2002, Brian May wrote:
>Thats why in spam assassin you can set the scores yourself...  fit them for
>your needs..


Well, adjusting the scores won't necessarily make the tool better. I'm sure 
that the computationally-derived scores are excellent. In fact, I submit 
that you've probably gone about as far as you can go with a flat linear model.

What I'm suggesting is that there is information in the SA results 
themselves that is not considered by SA. I'm talking about is the sort of 
post-processing that humans do after SA has done its thing. "What did I 
just decide, and does it make sense?"

The specific rule that I suggested was that if it has more than <threshold> 
positive SPAM scores, increase those scores by some constant or multiplier. 
It makes sense that a SPAM is *much* more likely to have multiple 
characteristics of SPAM than a non-SPAM, and the odds go up multiply, not 
linearly, so adjusting each individual linear score won't reflect that 
probability.

I know I'm assuming a lot here, but if:

twice as many SPAMs as non-SPAMs have Feature A, and
twice as many SPAMs as non-SPAMs have Feature B, and
twice as many SPAMs as non-SPAMs have Feature C, then

then what math symbol do you use to calculate the SPAM-worthiness of a 
message that has A, B, and C? It's not a plus sign, is it? It's not twice 
as hard to pick the Daily Double as it is to pick each race. I submit that 
if a message wins the tri-fecta, it's probably SPAM. ;-)

The reverse rule (e.g., a couple of -2.0 or lower negative scores being 
bolstered) may help save false positives. No doubt there are other 
interactions that may be help fine-tune results as well.

FWIW.

  /// Rob



_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] large numbers of tiny scores = SPAM!

Reply via email to