On Monday, February 20, 2006, 12:39:31 PM, Theo Dinter wrote:

> Just for some info...  I went through the set1 spam logs for 3.1 score
> generation.

> 1112804 total messages
>  776108 messages hit SURBL
>  138407 1 SURBL list(s) hit (1+ = 776108)
>  189795 2 SURBL list(s) hit (2+ = 637701)
>  281255 3 SURBL list(s) hit (3+ = 447906)
>  136964 4 SURBL list(s) hit (4+ = 166651)
>   29685 5 SURBL list(s) hit (5+ = 29687)
>       2 6 SURBL list(s) hit (6+ = 2)

> The set1 ham logs:

> 477629  total messages
>   1023  messages hit SURBL
>    992  1 SURBL list(s) hit (1+ = 1023)
>     23  2 SURBL list(s) hit (2+ = 31)
>      5  3 SURBL list(s) hit (3+ = 8)
>      3  4 SURBL list(s) hit (4+ = 3)
>      0  5 SURBL list(s) hit (5+ = 0)
>      0  6 SURBL list(s) hit (6+ = 0)


> So from these results, the FP rate is very low for SURBL (0.21%), and
> while there is a ton of overlap for spam (57.3%), there's very little
> for ham (0.01%).


Thank you for data.  They seem to support what we've been saying.

At a count of 138407, messages that hit only 1 SURBL are
significant, so lowering the scoring of a single list hit
significantly may result in significant FNs.

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Reply via email to