Over-scoring of SURBL lists...

Matt Kettler Wed, 15 Feb 2006 15:27:51 -0800

All this hubub about not filtering the list has made me come to a realization.


The SURBL URIBLs are collectively massively over-scored in SA 3.1.0.

The problem lies in the SURBL lists, over time, become largely redundant with
one another. A URI may be first listed by one or another of the SURBL lists, as
each has their own feeds, but if it's really used in spam run it will quickly
get listed in 3 or more of them.

Take for example this ONE uri that was posted to the list:
checpri *MUNGED*.com

This is currently listed in SC, JP, and AB on SURBL.
score URIBL_AB_SURBL 0 3.306 0 3.812
score URIBL_JP_SURBL 0 3.360 0 4.087
score URIBL_SC_SURBL 0 3.600 0 4.498

In a set3 configuration that's 12.397 points, just for having ONE URI in the
message. I don't know about you, but it strikes me as rather excessive.

Compare this to the RBLs supported by 3.1.0. XBL is the highest scoring RBL and
it's only 3.8.. You'd have to be listed in at least 5 RBLs to break 12 points
with SA 3.1.0.. The highest four scoring RBLs are:


score RCVD_IN_XBL 0 3.114 0 3.897
score RCVD_IN_NJABL_SPAM 0 1.905 0 2.775
score RCVD_IN_DSBL 0 1.801 0 2.600
score RCVD_IN_WHOIS_BOGONS 0 1.811 0 2.430
----------
11.702

Also consider that these lists have highly diverse listing criteria, and merely
sourcing spam is not enough to get listed in all of these 4.

Yet a mere 3 URIBLs sails right past the 12 point mark with ease. And these 3
URIBLs (as well as OB and uribl.com's BLACK) all have highly similar listing
criteria. They all list on slightly different policies, but when you remove the
fine details they all list based on "domains reported as spam which don't appear
to be used legitimately". The differences exist in where they collect reports
from, and how much checking they do for legitimacy.

The other problem is that I've seen a repeated pattern where FP's get reported
to more than one list. In fact, I rarely see a FP that isn't at least
double-listed. (For example I had the download site for paid-registered upgrades
to a programmer's text editor get double-listed recently. It will cost you
$39.95 to get signed up for that "spam")

This makes me wonder if SA wouldn't be better off having some kind of meta rules
that simply count how many URIBLs the message is listed in, or at least some
kind of score-limiting feedback on multiple hits. This would allow lists to
score high individually, but prevent overlapping FPs from being driven into
astronomical score levels just for containing a single URI that someone
mis-reported to multiple sources.

Thoughts, concepts?

Over-scoring of SURBL lists...

Reply via email to