Hello,
The way I understand "greylisting" it will have to be implemented at the
MTA level, so I am not sure it could be implemented in spamassassin.
Nonetheless, for all the initial negative reaction in this forum,
"greylisting" is a fantastic new proposal which will be implemented at
many sites. Attached is an email that was recently sent to admins on our
campus requesting discussion on the matter. I attach it here since it is
very informative on the subject.

Sincerely,

   - Henrik
There was recently a paper released on a new SPAM abatement methodology
that strictly uses the nature of the SMTP protocol itself to provide the
relief.  For all of the details, please see the original paper:
    <http://projects.puremagic.com/greylisting/>

In this note, I am going to briefly outline the nature of the methodology,
how it works, what it would mean for TAMU, and what my observations of it
have been with my testing.

First, one of the main methods of SPAMming is that a SPAM site has a
database and a small machine somewhere.  This machine just reads addresses
from the database, connects to the destination host for that address, spews
the message ignoring the SMTP result codes, and disconnects.  The main
point with that scenario is that the spammer's machine never actually
performs the queuing and retrying that a "real" mail server should do; that
is one of the ways that they keep their costs down and put the burden of
the e-mail solely on the recipient machine (i.e. TAMU hosts and servers).

For our part, that means that they spew these thousands of messages to our
site (particularly to smtp-relay.tamu.edu), and send all these messages
that appear to be valid from our viewpoint:   The sender addresses "look"
good in that we can resolve the hostnames used on the right of the "@", and
the recipients are in the tamu.edu domain (or one of several others that
are handled on-campus) such that we should accept responsibility of
delivery for them.

Unfortunately, the databases used by the spammers have been accreting for
over a decade and there are many completely bogus addresses in them.
Many of those addresses are destined for hosts that were valid at one time,
but are no longer connected to the network, even though they are still in
DNS (since neither the previous host owner [who should have], nor CIS
networking [who would have to do a lot of owner contacting/etc to do so]
have cleaned up those ancient hostnames that previously were valid for
receiving mail).  With this deluge of bogus messages it takes our disk
space to queue it until the timeout for the messages arrives and we then
bounce the message back to the (usually bogus) sender.  Since the sender is
bogus, the messages "double-bounce" or timeout if the sender was faked as
yet another no-longer deliverable hostname.

Several problems are caused by this:
    1:  Queuing locally: not a big problem in that we have designed the
        systems to be able to queue massive amounts of e-mail in the event
        of some catastrophe where large numbers of campus hosts are
        unavailable; but, it does still needlessly utilize State-of-Texas
        resources for this unsolicited commercial traffic that will not be
        delivered anyway.
    2:  Queuing locally #2: Since there is this long list of back-logged
        e-mail, it still takes time for the system to make the attempts to
        the unavailable systems making it take longer for real recipients
        to get their mail in the event that their mail was queued if their
        server went down and came back up.
    3:  Double-bounce deluge: Proper managing of mail systems requires
        reading and monitoring of the automated messages produced by the
        systems.  With the amount of SPAM that attempts to pass through the
        systems each day, my team of admins must handle 6,000 to 10,000
        double-bounces each day.  There are some scripts and filters to
        handle the majority of the alerts, but the more noise there is, the
        easier it is for us to over-look a real problem.

This brings us to the new methodology presented in the paper above.


The main tenet of the proposal is to move the burden and expense of the
queuing to the sending system temporarily to make sure they are a real mail
system, after which everything happens as it does now.

[As you read this, note that local-hosts will be pre-white-listed for no
delays ever.]

The way the remote site is forced to queue the message temporarily is that
when a new message gets offered from a remote host out host uses a database
to note the sending IP, and the sending and recipient e-mail addresses.
Our side answers the attempt to send with a 400-level "TMPFAIL" answer that
says "I should be able to accept this message; but, I can't do it right
now, please try again later" as defined in the SMTP protocol.  The remote
site will queue it and try to re-send again later.  At least an hour later
(by default), when the same host using the same e-mail addresses tries to
resend the message, the message will be accepted just like happens
currently and that triplet of information is marked in the database so that
future e-mail will pass through with no delay.

Only the first time an IP/sender/recipient triplet is seen will there be a
short delay, after which there will be no delay.

SPAM sites, on the other hand, do not do queuing, so when we say TMPFAIL,
they don't care, they're just spewing the message ignoring our answers.
Since they don't queue, they won't re-send the message and the intended
recipient will never get their SPAM.

Obviously there are a few ways that the senders of SPAM can respond:

    o Spammers can set up full mail-servers that perform proper queuing:
      this will require much more hardware and bandwidth on their side
      moving the cost of sending SPAM closer to the source anyway, and it
      will also stop the use of zombies and other distributed sending
      programs so that the receiving side will have a consistent sending
      host should a manual block be warranted.

    o Spammers can modify their SPAM software to just use the database to
      re-try the messages some time after the hour and before a
      configurable time-out (the default is 4 hours):
      This is almost the same as queuing for them; they will have to use a
      consistent IP rather than relying on a network of zombies to do the
      sending or their SPAM will not be allowed to pass; they will also
      have to use consistent sender addresses rather than making up
      obvious fake ones like "[EMAIL PROTECTED]"; and they are also
      increasing their network bandwidth charges since every message they
      send out with a different sender will incur the wait/retry
      requirement.

    o SPAM sites can continue to make use of "open relays" and send all of
      their mail to those unsuspecting third-parties who will then relay
      their mail for them.  Since those open relays have full mail servers
      that do proper queuing, the mail would eventually make it through.
      We would either have to live with those messages and expect such
      abuse to eventually cause the mail server to be properly closed
      (which is still a problem 5 years after it became obvious to the
      world they are a problem), or we could alter the mail server configs
      on our end such that we do not merely get tags of a match with
      RCVD_IN_OSIRUSOFT_COM in the X-PerlMx-Spam: header, but we should
      actually use the relays.osirusoft.com RBL to block mail as I was
      going to propose even before this new methodology appeared.

Regarding "real" mail, there is one obvious and one discovered issue:

    1:  The wait or cooling-off time:
        The first time someone off-campus tries to send mail to someone
        on-campus there will be a 1-hour wait for delivery.  This is only
        an initial event, after that first message, there will never be any
        further delays, even for messages sent only once a month.  Also,
        campus hosts can be pre-exempted from this list since we know their
        IPs, and, if there is abuse, those IPs are easily tracked down,
        unlike off-campus sources.

    2:  Certain mailing-lists:
        We will need to white-list certain mailing lists such as Yahoo!
        Groups and SecurityFocus (AKA BUGTRAQ) since they do something
        amazing I never expected to see:  Each and every delivery attempt
        to a single remote address uses a unique sender address.  That's
        not each and every e-mail, that is each delivery attempt of the
        same e-mail.  Never-the-less, those few sites that do such crazy
        things will be white-listed prior to implementation.

I have been running this particular implementation on my personal mail
server since this last weekend and one of the local ISPs has been
experimenting with it as well.  So far our SPAM levels have dropped 2
orders of magnitude (i.e., I am down from over 90 SPAM messages each day
in my personal mailbox to fewer than 5) all the while I am receiving all of
my personal and mailing-list traffic as before.

This is by far the best solution I have implemented and I think it possible
to configure the TAMU SMTP systems to gain all of the benefits.  I am just
beside myself with how well it is working in practice.  This is in addition
to the existing procedures already in place to block viruses and other
protocol errors, and tagging of anything that does make it through would
continue unaltered.

Reply via email to