On Tue, 2006-02-21 at 06:53 -0800, Jeff Chan wrote:
> On Monday, February 20, 2006, 12:39:31 PM, Theo Dinter wrote:
> 
> > Just for some info...  I went through the set1 spam logs for 3.1 score
> > generation.
> 
> > 1112804 total messages
> >  776108 messages hit SURBL
> >  138407 1 SURBL list(s) hit (1+ = 776108)
> >  189795 2 SURBL list(s) hit (2+ = 637701)
> >  281255 3 SURBL list(s) hit (3+ = 447906)
> >  136964 4 SURBL list(s) hit (4+ = 166651)
> >   29685 5 SURBL list(s) hit (5+ = 29687)
> >       2 6 SURBL list(s) hit (6+ = 2)
> 
> > The set1 ham logs:
> 
> > 477629  total messages
> >   1023  messages hit SURBL
> >    992  1 SURBL list(s) hit (1+ = 1023)
> >     23  2 SURBL list(s) hit (2+ = 31)
> >      5  3 SURBL list(s) hit (3+ = 8)
> >      3  4 SURBL list(s) hit (4+ = 3)
> >      0  5 SURBL list(s) hit (5+ = 0)
> >      0  6 SURBL list(s) hit (6+ = 0)
> 
> 
> > So from these results, the FP rate is very low for SURBL (0.21%), and
> > while there is a ton of overlap for spam (57.3%), there's very little
> > for ham (0.01%).
> 
> 
> Thank you for data.  They seem to support what we've been saying.
> 
> At a count of 138407, messages that hit only 1 SURBL are
> significant, so lowering the scoring of a single list hit
> significantly may result in significant FNs.

But maybe we have to have a scoring like this
- current SURBL score if only on that list
- if on List1 and list2 then not a score of list1+list2 but more like a
basic SURBL score + fixed value
- if on List1 and list2 and list3 then not a score of list1+list2+list3
but more like a basic SURBL score + 2*(fixed value)

21% of all the SURBL hitting spam hit more then 4 list records. If this
where a FN (not very likely but possible) then the score would be to
high to compensate but if we use a scoring rule like above then the
score of a 4+ hiting spam message would be e.g.
basic SURBL score = 3
3*fixed value = 1
score = 6
and maybe with a SURBL list with very low FP score there could be a gain
in the fixed value score.

Maurice Lucas



Reply via email to