On 15/12/10 00:43, RW wrote: > On Tue, 14 Dec 2010 15:52:28 -0800 (PST) > John Hardin <jhar...@impsec.org> wrote: > >> On Tue, 14 Dec 2010, Cedric Knight wrote: >> >>> So a hash is best, >> >> Agreed. >> >>> and I'd suggest SHA1 over MD5. >> >> Just out of curiosity, why? An MD5 hash is shorter than an SHA hash >> (an important consideration when you're making lots of DNS queries of >> the hash), MD5 is computationally lighter than SHA, and MD5 is robust >> enough for this purpose, even though artificial collision scenarios >> exist.
Maybe I was being over-cautious, based on articles (which I can't find online any more) suggesting MD5 is likely to become trivial to crack in future owing to mathematical shortcuts. It's not as if you can recover the data from a hash, or even (as I read it) that you can create a collision for any given hash yet, but there may be a problem in any context with assuming something is secure when it's only semi-secure. I am not a mathematician or security expert, therefore I am swayed by pronouncements from US-CERT: "Do not use the MD5 algorithm Software developers, Certification Authorities, website owners, and users should avoid using the MD5 algorithm in any capacity. As previous research has demonstrated, it should be considered cryptographically broken and unsuitable for further use." http://www.kb.cert.org/vuls/id/836068 OK, so this isn't a cryptographic application. I'm just thinking future-proofing. Some background for non-experts like me: http://www.maa.org/devlin/devlin_02_06.html SHA1 is 40 characters, as against MD5's 32, which isn't such a great difference, considering an IPv6 lookup is 64 under rfc5782. >> Granted I wouldn't sign a legal document with it any more, but for a >> private perfect hash of an email address, why not? > > I don't see that there's all that much added security anyway. > > I don't think spammers are likely to intercept dns as a way of > harvesting addresses. > > As far as general privacy is concerned, without a shared-secret anyone > can generate the hash and look for known addresses. And if you don't add > salt to the hash, it's going to be fairly easy to perform an efficient > dictionary attack, in which case the choice of hash function makes > little difference. I wasn't thinking of harvesting by spammers, but by (say) a government authority that does not already have a dictionary of addresses that is known to be complete. This is information in non-spam bodies that might be looked up (well it would be if you want to use it to block 419 scams). Also, possibly people might want to use the same hashing standard for a DNSWL of (maybe DKIM-verified) email addresses, meaning that list would be abusable by spammers who are able to create a hash collision. CK