At 03:23 PM 5/29/2002, Brian May wrote: >Thats why in spam assassin you can set the scores yourself... fit them for >your needs..
Well, adjusting the scores won't necessarily make the tool better. I'm sure that the computationally-derived scores are excellent. In fact, I submit that you've probably gone about as far as you can go with a flat linear model. What I'm suggesting is that there is information in the SA results themselves that is not considered by SA. I'm talking about is the sort of post-processing that humans do after SA has done its thing. "What did I just decide, and does it make sense?" The specific rule that I suggested was that if it has more than <threshold> positive SPAM scores, increase those scores by some constant or multiplier. It makes sense that a SPAM is *much* more likely to have multiple characteristics of SPAM than a non-SPAM, and the odds go up multiply, not linearly, so adjusting each individual linear score won't reflect that probability. I know I'm assuming a lot here, but if: twice as many SPAMs as non-SPAMs have Feature A, and twice as many SPAMs as non-SPAMs have Feature B, and twice as many SPAMs as non-SPAMs have Feature C, then then what math symbol do you use to calculate the SPAM-worthiness of a message that has A, B, and C? It's not a plus sign, is it? It's not twice as hard to pick the Daily Double as it is to pick each race. I submit that if a message wins the tri-fecta, it's probably SPAM. ;-) The reverse rule (e.g., a couple of -2.0 or lower negative scores being bolstered) may help save false positives. No doubt there are other interactions that may be help fine-tune results as well. FWIW. /// Rob _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk