Adam Katz wrote:

Mike Cardwell contended:
It would definitely require a hashing algorithm, like MD5. IIRC
there is a maximum length for a hostname, and that is 255
characters. What if the hostname in your email address is 255
characters long on it's own...?

When MD5sums were first proposed (in place of my wild escaping), it
seemed like a great idea.  However, a voice in the back of my head,
now spoken (typed?) by Rob, has been growing louder.  My
implementation now merely truncates email usernames to 16 characters
(plus the noted defanging, which makes it complicated again ...) and
replaces the @ with a dot (not an underscore, that's not a legal
character).

Hmmm. I'm still not convinced you've done it the best way. That conversion sounds a lot more complicated than a straight MD5 conversion, and it doesn't deal with the fact that there is a maximum length for an FQDN.

In fact, collisions here could be regarded as good, as usernames that
long can include tracking strings (e.g. the mailer for our list,
users-return-12345-joe=bob.com@ spamassassin.apache.org, becomes
users-return-123.spamassassin.apache.org), which should help.

That could be seen as an advantage I suppose. But, the particular source list being used here wasn't meant to be used that way. Some people might consider such hits as false positives.

I did fully implement my proposed latter 16 characters (of MD5's 32)
plus dot plus the domain, complete with hash lookups, but I just
removed it (which is why non-test lookups will fail for the next ~4h).

Having access to the plain text email address would only make it
easier for ISPs to do anything if they had access to the zone file.
In which case, you could just give them access to a separate list
which has the email addresses in plain text.

Unless we're replacing the currently well-groomed upstream source at
http://anti-phishing-email-reply.googlecode.com/#, I see no reason to
offer such services (since they do it better).

So in rbldnsd, ...

Whoa, what's that?!  Interesting ... it's even in Debian.  I think I'm
happy with BIND for the moment, since my origin point is hidden from
use and the actual NS records are merely slaves run by zoneedit (so
efficiency isn't really important).  I probably need to stay on BIND
as I doubt I could use rbldnsd to host my SpamAssassin channels.

I implemented pretty much exactly the same thing that you did, except it uses a straight hexadecimal MD5 digest of the full address. I know this isn't strictly correct as the local part of an email address is technically case sensitive, but as email addresses in the real world are case *insensitive* I convert it to lower case before hashing.

Eg:

r...@haven:/var/lib/rbldns# host -t a bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com A 127.0.0.3 bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com A 127.0.0.1 r...@haven:/var/lib/rbldns# host -t txt bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com TXT "20090411"
r...@haven:/var/lib/rbldns#

That RBL wont stay public for long so don't use it for anything other than a quick test.

Here's the code I use to download the data and populate an rbldnsd file:

https://secure.grepular.com/phishing_addresses.txt

You might find something you can strip out and re-use.

Here are the Exim acls I use to query it for the envelope sender, From header and Reply-to headers:

acl_smtp_mail:

deny dnslists = phishing.email.rbl.grepular.com/${md5:${lc:$sender_address}}

acl_smtp_data:

deny dnslists = phishing.email.rbl.grepular.com/${md5:${lc:${address:$h_From:}}}

deny dnslists = phishing.email.rbl.grepular.com/${md5:${lc:${address:$h_Reply-To:}}}

I'm not familiar enough with writing SpamAssassin rules yet to write a SpamAssassin recipe.

--
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Reply via email to