On Mon, Oct 14, 2002 at 07:40:38PM -0700, Daniel Quinlan wrote:
> [EMAIL PROTECTED] writes:
> 
> > I only used the first message in my spam box, one that scored highly the
> > first time around.  I'm sure I could pick a half a dozen at random and see
> > similar results.
> 
> So what?  Like I said, it's not how individual scores change, it's how
> false positives and false negatives change.

I agree, you can't look at an individual message's scores over two
releases and compare the "effectiveness" of the scoring. But the rate of
FPs and FNs is based both on the distribution of the scoring, and the
threshold chosen by the user. I'm not seeing a great deal of consistency
there. Here's some actual data:

SpamAssassin 2.41 - around 20k messages scanned, over the course of one
day. Our threshold is 10 - we're being pretty conservative here, FPs are
bad. We're not collecting stats on scores less than 5.

Score of between 5 and 10 - 6.8% of the total.
Score of greated than 10 - 15.3% of the total.

SpamAssassin 2.42 - a week later. Same volume, same threshold. Only
configuration difference is that AWL is off, due to the bug - if
anything this should theoretically raise scores now.

Score of between 5 and 10 - 15.9% of the total.
Score of greater than 10 - 6.0% of the total.

As you can see, almost the same proportion of mail is scored above 5.0
(I would expect this) - but now only a fraction of said mail is being
blocked as spam. To get the same level of blocking on that second day,
under 2.42, I actually have to lower the threshold from 10 to 6.6 - not
a small tweak.

Now, my rudimentary grasp of statistics tells me that, ignoring the
scores of an individual message, the overall scoring distribution should
be consistent between releases, otherwise the threshold does need to be
adjusted with each release, which is not desirable behaviour.

Regards,
Barnaby
-- 
Barnaby Brown                            -              Systems Engineer
Pacific Internet (Australia) Pty Ltd     -     http://www.pacific.net.au


-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to