> -----Original Message----- > From: Mike Leone > Sent: Sunday, June 01, 2003 8:25 AM > > So bayes wouldn't learn this was spam, unless the score was 19? I > rarely get > spam with scores higher than that. Am I misunderstanding? >
In my recent spam mailbox, with about 2500 messages (over the past 10 days), the scores are distributed as follows: 0% 5.00 10% 7.10 20% 8.50 30% 10.20 40% 12.00 50% 13.90 60% 15.80 70% 18.20 80% 20.90 90% 25.30 100% 62.50 I've found the "if score >= 10.0 then probably spam; if score >= 20.0 then definitely spam" rule of thumb to be a pretty good one. Mileage varies. BTW, the scores above use some small adjustments to the out-of-the-box SA 2.55 scores, which I tuned to eliminate the marginal spams without increasing false positives. A similar sized sample (2200 messages over the past 1.5 months) in my archived incoming mail (ham) is: 0% -107.80 10% -1.30 20% -0.50 30% 0.00 40% 0.50 50% 0.80 60% 1.10 70% 1.90 80% 2.60 90% 3.40 100% 4.90 Here, I'd say that 2.60 and below is most likely ham, not spam. In my "false negatives" folder (about 750 items collected over the course of 9 months), the distribution is: 0% -6.20 10% 1.30 20% 2.20 30% 2.60 40% 3.40 50% 3.80 60% 4.10 70% 4.40 80% 4.60 90% 4.80 100% 4.90 If I set the threshold at 3.4, I'd eliminate 60% of the false negatives (spam mis-classified as ham), but would throw out roughly 10% of the ham. Probably a bad trade. Better, is to find filters/scores that differentiate them. ------------------------------------------------------- This SF.net email is sponsored by: eBay Get office equipment for less on eBay! http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk