On Tue, 2006-02-21 at 06:53 -0800, Jeff Chan wrote: > On Monday, February 20, 2006, 12:39:31 PM, Theo Dinter wrote: > > > Just for some info... I went through the set1 spam logs for 3.1 score > > generation. > > > 1112804 total messages > > 776108 messages hit SURBL > > 138407 1 SURBL list(s) hit (1+ = 776108) > > 189795 2 SURBL list(s) hit (2+ = 637701) > > 281255 3 SURBL list(s) hit (3+ = 447906) > > 136964 4 SURBL list(s) hit (4+ = 166651) > > 29685 5 SURBL list(s) hit (5+ = 29687) > > 2 6 SURBL list(s) hit (6+ = 2) > > > The set1 ham logs: > > > 477629 total messages > > 1023 messages hit SURBL > > 992 1 SURBL list(s) hit (1+ = 1023) > > 23 2 SURBL list(s) hit (2+ = 31) > > 5 3 SURBL list(s) hit (3+ = 8) > > 3 4 SURBL list(s) hit (4+ = 3) > > 0 5 SURBL list(s) hit (5+ = 0) > > 0 6 SURBL list(s) hit (6+ = 0) > > > > So from these results, the FP rate is very low for SURBL (0.21%), and > > while there is a ton of overlap for spam (57.3%), there's very little > > for ham (0.01%). > > > Thank you for data. They seem to support what we've been saying. > > At a count of 138407, messages that hit only 1 SURBL are > significant, so lowering the scoring of a single list hit > significantly may result in significant FNs.
But maybe we have to have a scoring like this - current SURBL score if only on that list - if on List1 and list2 then not a score of list1+list2 but more like a basic SURBL score + fixed value - if on List1 and list2 and list3 then not a score of list1+list2+list3 but more like a basic SURBL score + 2*(fixed value) 21% of all the SURBL hitting spam hit more then 4 list records. If this where a FN (not very likely but possible) then the score would be to high to compensate but if we use a scoring rule like above then the score of a 4+ hiting spam message would be e.g. basic SURBL score = 3 3*fixed value = 1 score = 6 and maybe with a SURBL list with very low FP score there could be a gain in the fixed value score. Maurice Lucas