From: "Matt Kettler" <[EMAIL PROTECTED]>
Jeff Chan wrote:
On Saturday, February 18, 2006, 2:36:29 AM, Matt Kettler wrote:
While multi-listing is somewhat common in RBLs, it's the vast majority
of cases in URIBLs. Over 50% of my mail that hits any surbl.org lists
hits 3 or more of them.
And how many of those are spams versus hams?
That sample was 100% spam. I was not trying to point out FP rate
problems, merely that overlap is in fact VERY common on surbl.
And again, it's not the over-lap in-and-of-itself that's a problem. It's
when the overlap matches nonspam that problems occur. I don't have any
nonspam samples onhand with surbl overlap. Only surbl/uribl overlap.
Matt, I think your worry about overlap is faulty. If the lists all
fed off one common database it would be a worry. Then the correlation
would be a symptom of the system not working. If they all work off
more or less individual captures and submissions their raw databases
have low correlation. If their results correlate well, as in "overlap"
as you are using it, that is an indication of their goodness.
In the first instance for overlap, common raw database, then adding up
individual scores is a bad idea. In the second instance, particularly
for spam trap based lists, adding up individual scores raises the
indication that the discovered address IS spam.
For your worries what I might do is go out and look at which databases
are filled mostly or exclusively from spam traps and rate them highly
while rating those that rely solely on submissions low. And the existing
scores seem to indicate "it's already been done." You may be running too
many of the submission based BLs, though. (SA may default badly in that
regard.)
So stated bluntly, the lists do overlap. "So what?" They are SUPPOSED to
overlap for a spam source, if it really is a spam source. An individual
list's goodness is based on how the data is gathered and vetted. If you
do not like the way a given list gathers data then reduce it's score from
the values SA uses. That is easy enough to do, individually or globally.
Based on Jeff's comments the lists do not overlap by sharing resources.
If they overlap it is because someone submits to several lists, meaning
the person REALLY believes a site to be a spam source, or just maybe
because the site really IS a spam source and has spammed several quite
unique spam traps.
"Overlap" as a worry is utterly spurious. You must dig down below that
word to find the problem. Wall papering over the problem is useless.
{^_^}