Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain

jdow Sun, 19 Feb 2006 01:29:21 -0800

From: "Matt Kettler" <[EMAIL PROTECTED]>


I'm thinking of something like:
score URIBL_SURBL 2.0
score URIBL_AB_SURBL 1.812
score URIBL_JP_SURBL   2.087
score URIBL_OB_SURBL 1.008
score URIBL_PH_SURBL  0.800
score URIBL_SC_SURBL  2.498
score URIBL_WS_SURBL 0.140


Whereas I am thinking of increasing some of the scores over all. {o.o}

Each individual list maintains it's score total. But the additive
effects are more limited. You could still go over 5 by hitting the
highest scoring ones, but double- WS+OB would total 3.1 instead of 5.1.
You'd still get a 6.585 for a double SC+JP, and >5 for  many other
2-list combinations, but you wouldn't be so far over the line that bayes
couldn't fix the occasional FP.

Right now JP+SC scores  8.585, which even BAYES_00 can't bring back down
under the 5.0 line. I trust the URIBLs a lot, I think they're great. But
I don't trust them so much that two of them should be able to over-ride
BAYES_00 without any other spam rules firing.


I'm content with the scores as is BECAUSE they override Bayes_00. The
really short virtually URI only spams get rather low Bayes scores as
a general rule. So having the BLs toss them over the top is a good
thing as I see it here.

So your proposed solution has its own problems. At the very least
you should explore the solution you propose to see if you can make
it work. (By the way, is it users complaining or the folks at some
of the vendor sites? I've been presuming it is user complaints that
are Plaguing you.)

It's user complaints.. not to mention my own mail ending up in the spam
bin.. Both posted examples came from *MY* personal email, not my users.

I'm sick of having to fetch conference announcements, product
announcements, etc out of my spam bin. I shouldn't have to add whitelist
rules for every vendor I do business with because some of the URIBLs are
getting over-zealous.


I can tell that. And I'm not sure of any given solution. Although I
hope I remember to vamp on a vague idea I have below. It'd be a new
concept, "whitelist_from_rcvd_score". That would be whitelist_from_rcvd
with a custom score per whitelist entry. Couple this with a "don't
complain to me, load it into this mailbox and it'll be fixed
automagically" script and mailbox to get something that might work OK.
The idea would need a lot of development and some level of per user
rules to make it work. For an ISP I can see a lot of problems with
the concept of "fully automatic." But if someone vets the account
info maybe it'd be OK.

I also note that with the "winterizewithscotts" site the company made
a very logical and fatal mistake.s.


Fair enough, I can see HOW the domain got listed by mistake. However, is
a system which is prone to mistakes worth 6.008 points (default score
for OB in SA 3.1.0 + uribl.com's suggested 3.0 score for URIBL_BLACK).


Yes, given the fact that the mistakes seem to be very hard to make
and one presumes some actual checking on the complaints.

Sure I can customize.

But I'm creating this public discussion on the list not to bash the
URIBLs, but to get people thinking about better ways to score them to
avoid score-inflation when FPs happen, while still keeping the spam
scores reasonable. I was hoping some SOLUTIONS would come out.

So far most of what I've gotten is a bunch of defensive garbage denying
it ever happens, or trying to explain away the problem without giving it
any serious consideration.


Most of us do not see any problem, it would appear. I am SURE that any
comparison between your site and mine is utterly spurious. Yours is a
rather large ISP setting, I believe. Mine's two lonely little users
tucked away at the end of some DSL wire out in South Eastern San
Bernardino County using Earthlink as our ISP. (Too bad "ISP For Two"
doesn't scan with "Tea For Two." There's a temptation to filk. {^_-})

Now lemme see on that whitelist_from_rcvd_scored vamping.

I'll TRY to look at it from an ISP view with some per user rules
capability. (Without the per user rules capability some means of tweaking
the offending BL rules to flatten out the maximum score in a dynamic
basis is needed.)

Suppose each user can forward email through your authenticated smtp
server to a "this ain't spam" analysis tool. It takes apart the message
and looks at the scores. This may require the spam as attachment feature
for SA so that the "original" makes it through as well as all the
scoring. The scoring is dissected and the either the BLs that hit are
downscored very slightly or the whitelist_from_rcvd_scored entry is
added with a modest score "barely" sufficient to keep the item from
being scored as spam.

The user gets an entry that is guaranteed to prevent the message being
scored as spam. AND a global entry is added that subtracts about 1/5th
the range needed to make the message not spam is created. Each time a
similar complaint comes from another user, not the same one, the whitelist
level is cranked up by the initial delta amount until it no longer bugs
people. The idea is to require more than one user to flip something into
a global ham category. I don't have a feel for a good number of user
complaints for this tool.

The problem is that it's one more piece of software and opens a set of
potential system vulnerabilities.

1) This is new code with the usual level of new code vulnerabilities
that would have to be examined. Language selection may help here quite
materially. Protection against a malformed email would also be needed.

2) Some user in cahoots with a spammer might try to "stuff the ballot
box." Some code to protect for this might be needed.

3) Forged emails designed to open up a spam hole for a someone else as a
prank or as a crack.

4) My imagination is limited this evening. I am sure there are MANY more.

Another problem level is how to implement this. You CAN mimic the behavior
of whitelist_from_rcvd with meta rules, I suspect. If so then the tool
above would simply write meta rules and put them in place until enough
complaints were received. Then it would make a simple global and normal
format whitelist_from_rcvd rule.

Does this sound at all practical to you as a solution to take some of the
burden off you while not opening up too many holes?

(And all that said, a run on the last 6 weeks or so of mail logs suggests
that in about 59,000 hams there are some apparent false hits on the various
BLs. The levels are all down in the parts per ten thousand level, which is
normally quite good. For an ISP that passes that much gas in an hour or less
the level may not be quite so appropriate.)

  1    BAYES_99                        27636     4.73   30.43   87.33    0.05
  3    RCVD_IN_XBL                     16851     2.88   18.55   53.25    0.02
  4    URIBL_JP_SURBL                  14783     2.53   16.28   46.72    0.04
  5    URIBL_SC_SURBL                  14522     2.48   15.99   45.89    0.01
  6    RCVD_IN_BL_SPAMCOP_NET          13136     2.25   14.46   41.51    2.83
  7    URIBL_WS_SURBL                  12628     2.16   13.90   39.91    0.03
  8    URIBL_OB_SURBL                  12020     2.06   13.23   37.98    0.06
  9    URIBL_SBL                       11822     2.02   13.02   37.36    0.12
 10    URIBL_AB_SURBL                  10765     1.84   11.85   34.02    0.00
 13    URIBL_BLACKB                     8842     1.51    9.73   27.94    0.03
 14    RCVD_IN_DSBL                     8369     1.43    9.21   26.45    0.00
 16    RCVD_IN_SORBS_DUL                7404     1.27    8.15   23.40    0.12

Some of these figures rather surprise me. And others do not. (SPAMCOP is down
at 0.2 for a score. It seems appropriate. It's not QUITE useless.)

I can remember when several of these were down at the zero level rather than
one or two improper markups. What I find interesting is that no messages were
marked as spam based on these rules that was not actually spam unless the
total score was well over 15, which is about where I usually stop seriously
scanning for mismarked ham.

(And to be fair I have NOT suggested whitelisting some of the "messages" I
get from the likes of EDN and Reed communications. I *LIKE* getting most of
their <censored> marked as spam. I'm stuck getting it as part of the
subscription to EDN. I don't do surveys. {^_-} I also do the same thing
with the Earthlink special offers. I could probably turn them off. But it's
too much bother. I mark 'em as spam and ignore them. I do NOT send 'em off
to spamcop or anything, though.)

{^_^}

Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain

Reply via email to