Hello, The way I understand "greylisting" it will have to be implemented at the MTA level, so I am not sure it could be implemented in spamassassin. Nonetheless, for all the initial negative reaction in this forum, "greylisting" is a fantastic new proposal which will be implemented at many sites. Attached is an email that was recently sent to admins on our campus requesting discussion on the matter. I attach it here since it is very informative on the subject.
Sincerely, - Henrik
There was recently a paper released on a new SPAM abatement methodology that strictly uses the nature of the SMTP protocol itself to provide the relief. For all of the details, please see the original paper: <http://projects.puremagic.com/greylisting/> In this note, I am going to briefly outline the nature of the methodology, how it works, what it would mean for TAMU, and what my observations of it have been with my testing. First, one of the main methods of SPAMming is that a SPAM site has a database and a small machine somewhere. This machine just reads addresses from the database, connects to the destination host for that address, spews the message ignoring the SMTP result codes, and disconnects. The main point with that scenario is that the spammer's machine never actually performs the queuing and retrying that a "real" mail server should do; that is one of the ways that they keep their costs down and put the burden of the e-mail solely on the recipient machine (i.e. TAMU hosts and servers). For our part, that means that they spew these thousands of messages to our site (particularly to smtp-relay.tamu.edu), and send all these messages that appear to be valid from our viewpoint: The sender addresses "look" good in that we can resolve the hostnames used on the right of the "@", and the recipients are in the tamu.edu domain (or one of several others that are handled on-campus) such that we should accept responsibility of delivery for them. Unfortunately, the databases used by the spammers have been accreting for over a decade and there are many completely bogus addresses in them. Many of those addresses are destined for hosts that were valid at one time, but are no longer connected to the network, even though they are still in DNS (since neither the previous host owner [who should have], nor CIS networking [who would have to do a lot of owner contacting/etc to do so] have cleaned up those ancient hostnames that previously were valid for receiving mail). With this deluge of bogus messages it takes our disk space to queue it until the timeout for the messages arrives and we then bounce the message back to the (usually bogus) sender. Since the sender is bogus, the messages "double-bounce" or timeout if the sender was faked as yet another no-longer deliverable hostname. Several problems are caused by this: 1: Queuing locally: not a big problem in that we have designed the systems to be able to queue massive amounts of e-mail in the event of some catastrophe where large numbers of campus hosts are unavailable; but, it does still needlessly utilize State-of-Texas resources for this unsolicited commercial traffic that will not be delivered anyway. 2: Queuing locally #2: Since there is this long list of back-logged e-mail, it still takes time for the system to make the attempts to the unavailable systems making it take longer for real recipients to get their mail in the event that their mail was queued if their server went down and came back up. 3: Double-bounce deluge: Proper managing of mail systems requires reading and monitoring of the automated messages produced by the systems. With the amount of SPAM that attempts to pass through the systems each day, my team of admins must handle 6,000 to 10,000 double-bounces each day. There are some scripts and filters to handle the majority of the alerts, but the more noise there is, the easier it is for us to over-look a real problem. This brings us to the new methodology presented in the paper above. The main tenet of the proposal is to move the burden and expense of the queuing to the sending system temporarily to make sure they are a real mail system, after which everything happens as it does now. [As you read this, note that local-hosts will be pre-white-listed for no delays ever.] The way the remote site is forced to queue the message temporarily is that when a new message gets offered from a remote host out host uses a database to note the sending IP, and the sending and recipient e-mail addresses. Our side answers the attempt to send with a 400-level "TMPFAIL" answer that says "I should be able to accept this message; but, I can't do it right now, please try again later" as defined in the SMTP protocol. The remote site will queue it and try to re-send again later. At least an hour later (by default), when the same host using the same e-mail addresses tries to resend the message, the message will be accepted just like happens currently and that triplet of information is marked in the database so that future e-mail will pass through with no delay. Only the first time an IP/sender/recipient triplet is seen will there be a short delay, after which there will be no delay. SPAM sites, on the other hand, do not do queuing, so when we say TMPFAIL, they don't care, they're just spewing the message ignoring our answers. Since they don't queue, they won't re-send the message and the intended recipient will never get their SPAM. Obviously there are a few ways that the senders of SPAM can respond: o Spammers can set up full mail-servers that perform proper queuing: this will require much more hardware and bandwidth on their side moving the cost of sending SPAM closer to the source anyway, and it will also stop the use of zombies and other distributed sending programs so that the receiving side will have a consistent sending host should a manual block be warranted. o Spammers can modify their SPAM software to just use the database to re-try the messages some time after the hour and before a configurable time-out (the default is 4 hours): This is almost the same as queuing for them; they will have to use a consistent IP rather than relying on a network of zombies to do the sending or their SPAM will not be allowed to pass; they will also have to use consistent sender addresses rather than making up obvious fake ones like "[EMAIL PROTECTED]"; and they are also increasing their network bandwidth charges since every message they send out with a different sender will incur the wait/retry requirement. o SPAM sites can continue to make use of "open relays" and send all of their mail to those unsuspecting third-parties who will then relay their mail for them. Since those open relays have full mail servers that do proper queuing, the mail would eventually make it through. We would either have to live with those messages and expect such abuse to eventually cause the mail server to be properly closed (which is still a problem 5 years after it became obvious to the world they are a problem), or we could alter the mail server configs on our end such that we do not merely get tags of a match with RCVD_IN_OSIRUSOFT_COM in the X-PerlMx-Spam: header, but we should actually use the relays.osirusoft.com RBL to block mail as I was going to propose even before this new methodology appeared. Regarding "real" mail, there is one obvious and one discovered issue: 1: The wait or cooling-off time: The first time someone off-campus tries to send mail to someone on-campus there will be a 1-hour wait for delivery. This is only an initial event, after that first message, there will never be any further delays, even for messages sent only once a month. Also, campus hosts can be pre-exempted from this list since we know their IPs, and, if there is abuse, those IPs are easily tracked down, unlike off-campus sources. 2: Certain mailing-lists: We will need to white-list certain mailing lists such as Yahoo! Groups and SecurityFocus (AKA BUGTRAQ) since they do something amazing I never expected to see: Each and every delivery attempt to a single remote address uses a unique sender address. That's not each and every e-mail, that is each delivery attempt of the same e-mail. Never-the-less, those few sites that do such crazy things will be white-listed prior to implementation. I have been running this particular implementation on my personal mail server since this last weekend and one of the local ISPs has been experimenting with it as well. So far our SPAM levels have dropped 2 orders of magnitude (i.e., I am down from over 90 SPAM messages each day in my personal mailbox to fewer than 5) all the while I am receiving all of my personal and mailing-list traffic as before. This is by far the best solution I have implemented and I think it possible to configure the TAMU SMTP systems to gain all of the benefits. I am just beside myself with how well it is working in practice. This is in addition to the existing procedures already in place to block viruses and other protocol errors, and tagging of anything that does make it through would continue unaltered.