Matt Kettler wrote:

> I'll even re-quote myself:
>> I personally would like to see some statistics, but  at this point, we
>>  don't have any test data on this so we're arguing your theory vs mine.
> And your quote that I was counter-pointing:
>> As you can see the performance of the lists are different, and the way 
>> they're created is different too.
> 
> I don't see enough of a difference to clearly rule out significant overlap.
> 
> I'll define my test of "significant overlap" as:
>> 10% of total hits redundant across 3 or more lists and >1% nonspam hits
> redundant across 2 or more lists.
> 

Messages received today that are double-listed in two or more of SC, JP, AB, OB
and WS:
grep "SURBL_MULTI2" /var/log/maillog |grep "Feb 17" |wc -l
    292

All surbl.org hits in same timeframe (includes ph, but no matter):

grep "_SURBL" /var/log/maillog |grep "Feb 17" |wc -l
    583

So we at least have a 50% double-listing rate. That in-and-of-itself isn't much
of a problem, but it also doesn't rule out overlap. It's still a whole lot
higher than my first criteria of 10% overlap

However, right now I don't have more than 100 FPs so I can't really comment on
the nonspam hit rate of SURBL_MULTI2. That's the important one.

I also added multi3, multi4 and another rule to detect overlap between
uribl.com's black and surbl.org:

meta URIBL_BLACK_OVERLAP (URIBL_BLACK && (URIBL_AB_SURBL || URIBL_JP_SURBL ||
URIBL_OB_SURBL || URIBL_WS_SURBL || URIBL_SC_SURBL))
score URIBL_BLACK_OVERLAP -1.0

I'll see what kind of runtime data I can gather based on these rules over the
weekend.


Reply via email to