my emailBL is live!

Adam Katz Tue, 28 Apr 2009 17:41:35 -0700

This was actually rather simple to set up.  I'll publish the code
(AGPL) that runs it in a bit (I need to clean it up to withstand the
heavy-handed criticism on this list ...).  Note, I'm using ZoneEdit's
free NS mirroring, which has limited bandwidth.  I'm willing to pay
their minimum threshold if it gets that popular, but any more than
that and I'll be looking for other options.  (NOT PRODUCTION GRADE!)

A SpamAssassin plugin will be needed to get it working, too ... I
suspect there are gurus here who can do that part as easily as I did
the scraper and BIND code.  If nobody bites, I'll get to it in time.

For now, we have a functional proof-of-concept.  I'll post the code, a
more formal announcement, and more documentation to my blog and
website in a few days ("a few" might be a large number).  The emailBL
syncs with the upstream every 4h (I'd reduce the TTL and increase the
syncing frequency, but I'd risk running out of bandwidth).

(Note, the DNS will take another 1-4 hours to propagate.)

The structure of the upstream list:

    ADDRESS,TYPE[TYPE...],DATE

ADDRESS is an email address like <test@ emailbl.khopesh.com>
TYPE is one or more letters of A B C D as follows:
    A (reply-to)
    B (from, !reply-to)
    C (msg body has ADDRESS)
    D (msg body has ADDRESS obfuscated)
DATE is the last time it was seen, formatted YYYYMMDD, in UTC(?).

The structure of domains in my emailBL index:

    USER.DOMAIN.emailbl.khopesh.com  TXT  <DATE>
    USER.DOMAIN.emailbl.khopesh.com  A    127.0.0.<N_TYPE>

USER is the ADDRESS's username, altered as follows:
  s/^([...@+]{1,16})[...@]*@.*/$1/;  # truncate to 16 characters
  s/^[^a-z0-9]*|[^a-z0-9]*$//g;  # fix leading/trailing chars
  s/[^-a-z.0-9]/-/g;             # fix illegal chars
DOMAIN is the ADDRESS's domain
N_TYPE is a numerical version of TYPE above (A=1, B=2, C=3, D=4)

Main test points (with no space after the at sign, obviously):

    test@ example.com
        -> test.example.com.emailbl.khopesh.com
    test@ emailbl.khopesh.com
        -> test.emailbl.khopesh.com.emailbl.khopesh.com

Alternate test point (mimicking DNSBLs):

    2.0.0.127.emailbl.khopesh.com

Let's pretend we're in a shell (I've spaced all emails):
################

# Look up TXT record (last-seen DATE) for <test@ example.com>
$ host -t txt test.example.com.emailbl.khopesh.com.
test.example.com.emailbl.khopesh.com descriptive text "20090328"
$

# Look up A record (inclusion TYPE[s]) for <test@ example.com>
$ host test.example.com.emailbl.khopesh.com.
test.example.com.emailbl.khopesh.com has address 127.0.0.3
test.example.com.emailbl.khopesh.com has address 127.0.0.4
test.example.com.emailbl.khopesh.com has address 127.0.0.1
test.example.com.emailbl.khopesh.com has address 127.0.0.2
$

################

More comments in-line:

Jesse Thompson (developer of anti-phishing-email-reply) wrote me:
> Yes, I and others have thought of it.  But I don't need it since we
> only use the list to scan log files and populate mapping tables.  I
> don't have time or money to do any of this, and I'm kept pretty
> busy just updating the list...on top of my other bazillion other
> responsibilities.
> 
> You are welcome to use the list to create your own URIBL of course.

(Jesse is BCC'd.)  And so I did.  Thanks for keeping the list updated.
 Hopefully this emailBL will open your list to new horizons.  Clearly,
credit for the real work goes to you and the other APER developers.

Rob McEwen wrote:
>>> Personally, I think the obfuscation is overkill. Instead, I'd
>>> prefer to change the "@" symbol to an underscore (and any other
>>> minor change that might be needed to work with dns queries) and
>>> be done with it. This would also make the implementation easier,
>>> and research by ISPs easire.

Mike Cardwell contended:
>> It would definitely require a hashing algorithm, like MD5. IIRC
>> there is a maximum length for a hostname, and that is 255
>> characters. What if the hostname in your email address is 255
>> characters long on it's own...?

When MD5sums were first proposed (in place of my wild escaping), it
seemed like a great idea.  However, a voice in the back of my head,
now spoken (typed?) by Rob, has been growing louder.  My
implementation now merely truncates email usernames to 16 characters
(plus the noted defanging, which makes it complicated again ...) and
replaces the @ with a dot (not an underscore, that's not a legal
character).

In fact, collisions here could be regarded as good, as usernames that
long can include tracking strings (e.g. the mailer for our list,
users-return-12345-joe=bob.com@ spamassassin.apache.org, becomes
users-return-123.spamassassin.apache.org), which should help.

I did fully implement my proposed latter 16 characters (of MD5's 32)
plus dot plus the domain, complete with hash lookups, but I just
removed it (which is why non-test lookups will fail for the next ~4h).

>> Having access to the plain text email address would only make it
>> easier for ISPs to do anything if they had access to the zone file.
>> In which case, you could just give them access to a separate list
>> which has the email addresses in plain text.

Unless we're replacing the currently well-groomed upstream source at
http://anti-phishing-email-reply.googlecode.com/#, I see no reason to
offer such services (since they do it better).

>> So in rbldnsd, ...

Whoa, what's that?!  Interesting ... it's even in Debian.  I think I'm
happy with BIND for the moment, since my origin point is hidden from
use and the actual NS records are merely slaves run by zoneedit (so
efficiency isn't really important).  I probably need to stay on BIND
as I doubt I could use rbldnsd to host my SpamAssassin channels.

-- 
Adam Katz
khopesh on irc://irc.freenode.net/#spamassassin
http://khopesh.com/Anti-spam

my emailBL is live!

Reply via email to