On 08/04/10 19:58, Paolo Bonzini wrote: > MD5 is used simply as a kind of "random key generator", so it doesn't > matter.
That depends on what one is using "sort -R" for. If one uses it to choose data that are relevant for cryptographic purposes, it might matter. (Admittedly this is unlikely.) I'm not that familiar with cracking MD5, but I would guess that the cracking methods are rendered ineffective by the 128-bit salt that "sort -R" uses. If so, then there's no real problem. If the fact that MD5 is crackable is a problem, it'd be trivial to substitute (say) SHA256. However, this would slow down 'sort -R' considerably: switching to SHA256 would slow down 'sort -R' by a factor of 2.5 on the little million-line benchmark that I just tried it on ("seq 1000000", x86-64, Xeon E5620, gcc 4.5.1). > 1) why bother with memxfrm as a tie-breaker? isn't memcmp good enough? If two keys K1 and K2 compare equal, their random hashes are supposed to compare equal too. So if memcoll(K1,K2)==0, the random hashes must be the same. Hence we can't just do a memcmp on K1 and K2; we need to do a memcmp on strxfrm(K1) and strxfrm(K2). > 2) maybe there's something cheaper than md5 that can be used? For > example you could compare a^x and b^x where x is the output of a fast > 32-bit random number generator? That wouldn't be sufficiently random, even for non-cryptographic purposes, since keys that are natively nearby would tend to sort near to each other after being exclusive-ORed. But I see your point: perhaps there is something faster than MD5 for this sort of thing, and which is "secure" enough. Perhaps the ISAAC / ISAAC64 code that is already in GNU coreutils would work for that?