Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils)

Paul Eggert Thu, 05 Aug 2010 16:29:53 -0700

On 08/04/10 19:58, Paolo Bonzini wrote:

> MD5 is used simply as a kind of "random key generator", so it doesn't
> matter.


That depends on what one is using "sort -R" for.  If one uses it to
choose data that are relevant for cryptographic purposes, it might
matter.  (Admittedly this is unlikely.)

I'm not that familiar with cracking MD5, but I would guess that the
cracking methods are rendered ineffective by the 128-bit salt that
"sort -R" uses.  If so, then there's no real problem.

If the fact that MD5 is crackable is a problem, it'd be trivial to
substitute (say) SHA256.  However, this would slow down 'sort -R'
considerably: switching to SHA256 would slow down 'sort -R' by a factor of
2.5 on the little million-line benchmark that I just tried it on ("seq
1000000", x86-64, Xeon E5620, gcc 4.5.1).


> 1) why bother with memxfrm as a tie-breaker? isn't memcmp good enough?

If two keys K1 and K2 compare equal, their random hashes are supposed
to compare equal too.  So if memcoll(K1,K2)==0, the random hashes must
be the same.  Hence we can't just do a memcmp on K1 and K2; we need to
do a memcmp on strxfrm(K1) and strxfrm(K2).


> 2) maybe there's something cheaper than md5 that can be used?  For
> example you could compare a^x and b^x where x is the output of a fast
> 32-bit random number generator?

That wouldn't be sufficiently random, even for non-cryptographic
purposes, since keys that are natively nearby would tend to sort near
to each other after being exclusive-ORed.

But I see your point: perhaps there is something faster than MD5 for
this sort of thing, and which is "secure" enough.  Perhaps the
ISAAC / ISAAC64 code that is already in GNU coreutils would work
for that?

Re: propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils)

Reply via email to