RE: best of RBLs without the FPs

Herb Martin Sat, 24 Sep 2005 21:08:29 -0700

> Herb asked:
> >What makes you uncomfortable?  There really are only two issues:
> >1) Delay of legitimate email
> >2) Broken legitimate servers that won't resend
> 
> Herb,
> 
> Many web page event-driven e-mails may not have a re-try 
> mechanism. And I think that some legit opt-in lists and 
> newsletters also might be sent on a 1-pass scenario. ... in 
> addition to what you've conceded.


But these web page emails are not likely to be on RBLs
if they are running on legitimate email servers AND are
not open relays etc, and many/most such pages actually send
their email through some SMTP server precisely so they 
will not need to deal with retries etc.

I worried about this with my OWN customer web page email
(to me only) but realized that I was going to have it use 
a  reliable SMTP server anyway.  (One that an outside sender
cannot use.)

But whitelisting can take care of that...

> I don't see how a "greylisting-whitelist" could keep up with 
> the multitude of scenarios, even if these are all low 
> frequency percentage-wise.

There just aren't that many in our experience -- greylistd
comes with a nice whitelist (or broken servers) and you
can add more.

But this does not simplify things so it is important for
you to know that we have only done this once right after
installing the greylist daemon.

It just doesn't happen that we lose mail from web forms.

How much LEGITIMATE email do you get (or does anyone get)
from "unknown" web forms?

> However, I do see how your method is an excellent way to both:
> 
> (1) minimize the problems of greylisting by going after what 
> is almost always spam in the 1st place 
> 
> AND
> 
> (2) minimize the problems of FPs on RBLs since the legit mail 
> will almost always make it past the greylist.
> 
> Therefore, I don't knock your system... maybe I just need to 
> test the waters and get a little more comfortable with it.

Testing is good.

Depending on your email server this is easy -- with
Exim I just used a WARN on the greylist for a couple
of days, then switched it to DEFER (which is a temp
reject as opposed to a DENY.)

By the way there is an "add-on SA Plugin" that does
greylisting but I don't see the point of that as SA
is NOT really a spam filter but a content classifier
and only gives your MTA etc the info it needs to 
decide what to do with the email.

Also, for us SpamAssassin is TOO LATE in the chain 
95% of the time.  We've already greylisted most things
by the time SA runs (and thus avoid the expense of SA
processing if the mail is not from a reasonably 
functional SMTP server.)

> I fact, I probably will do this eventually... and I'm most 
> interested in putting it to use on only those messages which 
> just barely got caught and/or which just barely didn't get 
> caught by my current spam filtering. I'm kind of excited to 
> see if this can squeak a few tenths of a percent better 
> filtering out of my filter without generating FPs.

It will (help that is) but the key to what I am suggesting
(and doing) is that YOU get to decide.

We still use SpamAssassin for all the hard cases (and even
to drive a small amount of email through the greylist).  If
I had to give up one it would be Greylisting since SpamAssassin
is a more comprehensive and general tool.

So, all of those "excellent" RBLs that you trust can still be
used to REJECT, and the flaky ones that Greylist doesn't stop
can be used to SCORE in SpamAssassin.

This latter is true for ANY test you choose.  Some are reliable
enough to reject (or even accept) and some are still going to
get by Greylisting (about 10% of what we feed through greylisting
"comes back" again -- and truthfully almost none of that is
actually Ham.)

But of course, in our system NONE of what is rejected by such
"suspicion driven greylisting" is HAM..
 
> ONE MORE THOUGHT:
> In general, I think that they greylist-only people (NOT Herb, 
> btw) are just lazy and are willing to make due with an 
> inferior system which is brainless and easy... But this also 
> the problem with brain-less Bayesian-only filters and, 
> consequently, the spammers found ways to beat the Bayesian 
> filters. You know, there is a similarly easy way for them to 
> beat greylisting, too...

Defense in depth is the key.  Trying to do that efficiently
is critical for systems with high mail volumes. 

One of the useful features of my greylist mechanism is that
is REDUCES the load both in receiving mail bodies AND in
processing them through SpamAssassin.

Oddly enough my BAYES_99 and BAYES_95 in SA give 0% false 
positives and hits 70-80% of spam, pretty good for BAYES_99
+BAYES_95 but if this seems low to others it's important
to remember we are knocking down (over?) 90% of Spam before
SA even sees it.

One of the big advantages of this method is not that SA 
couldn't classify it but rather no human needs to later 
review the greylist defers that never return.

If there WERE an FP, the sender would (almost certainly)
get a Non-Delivery report from his OWN email server and
we NEVER send NDRs thus avoiding adding to the collateral
spam (back scatter) by accepting and later trying to 
notify the supposed "sender".

> simply track the "451 4.7.1" responses and send these again a 
> few minutes later. When this becomes common practice, those 
> who rely on greylisting will find their filters failing big time.

Yes.  That is absolutely correct and perhaps I should not
spend time explaining how to make safe use of greylisting
(as long as only a few of us do this the spammers will just
keep slamming everyone else <grin>)

But do realize that this is a significant improvement to the
current crop of spam zombies AND it will cause them to have
to work harder and expend more resources to get the same 
"benefit" they do now.  Slowing the zombies down is NOT a
bad thing.

This week I have written a SpamAssassin plugin for CRM114, 
a Markovian and Hyperspace classifier (akin to Bayes 
classification but with a perhaps more comprehensive
classifier).  It works and it will ADD to my defense 
in depth.

This thing is actually running in my production SA, and 
adding/subtracting score based on it's classification.

It's not suitable for everyone yet since it is still
crude (idiosyncratic to my systems) and has to call an
external executable (which is unsuitable for high volume
mail systems.)

--
Herb Martin

RE: best of RBLs without the FPs

Reply via email to