> -----Original Message-----
> From: Mike Leone
> Sent: Sunday, June 01, 2003 8:25 AM
>
> So bayes wouldn't learn this was spam, unless the score was 19? I
> rarely get
> spam with scores higher than that. Am I misunderstanding?
>

In my recent spam mailbox, with about 2500 messages (over the past 10 days),
the scores are distributed as follows:

       0%      5.00
      10%      7.10
      20%      8.50
      30%     10.20
      40%     12.00
      50%     13.90
      60%     15.80
      70%     18.20
      80%     20.90
      90%     25.30
     100%     62.50

I've found the "if score >= 10.0 then probably spam; if score >= 20.0 then
definitely
spam" rule of thumb to be a pretty good one. Mileage varies. BTW, the scores
above use some small
adjustments to the out-of-the-box SA 2.55 scores, which I tuned to eliminate
the
marginal spams without increasing false positives.

A similar sized sample (2200 messages over the past 1.5 months) in my
archived
incoming mail (ham) is:

       0%   -107.80
      10%     -1.30
      20%     -0.50
      30%      0.00
      40%      0.50
      50%      0.80
      60%      1.10
      70%      1.90
      80%      2.60
      90%      3.40
     100%      4.90

Here, I'd say that 2.60 and below is most likely ham, not spam.

In my "false negatives" folder (about 750 items collected over the course
of 9 months), the distribution is:

       0%     -6.20
      10%      1.30
      20%      2.20
      30%      2.60
      40%      3.40
      50%      3.80
      60%      4.10
      70%      4.40
      80%      4.60
      90%      4.80
     100%      4.90

If I set the threshold at 3.4, I'd eliminate 60% of the false negatives
(spam mis-classified as ham), but would throw out roughly 10% of the ham.
Probably a bad trade. Better, is to find filters/scores that differentiate
them.




-------------------------------------------------------
This SF.net email is sponsored by: eBay
Get office equipment for less on eBay!
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to