[issue34751] Hash collisions for tuples

Tim Peters Mon, 24 Sep 2018 18:52:20 -0700


Tim Peters <t...@python.org> added the comment:


And one more:

        x = (x * mult) ^ t;

also appears to work equally well.  So, way back when, it appears we _could_ 
have wormed around the disaster du jour just by applying a shift-xor 
permutation to the raw hash results.

Note the implication:  if we believed our tuple hash worked OK before then (& 
we did), adding a shift-xor step would have changed nothing about it _except_ 
to permute the input space.  That alone was enough to repair the nested tuple 
problems.

Note that permutations are a bog standard way to improve algorithms with bad 
behaviors in some cases.  For example, at the end of the Mersenne Twister they 
do 4 permutations to improve otherwise-marginal results from "the real" 
pseudo-random generator:

    y ^= (y >> 11);
    y ^= (y << 7) & 0x9d2c5680U;
    y ^= (y << 15) & 0xefc60000U;
    y ^= (y >> 18);
    return y;

At this point I see no "good reason" to prefer any of

    x = (x * mult) ^ t; // like Bernstein 33X & FNV-1
    x = (x * mult) + t; // like Bernstein 33A
    x = (x ^ t) * mult; // like FNV-1a

to any other.  In their _original_ contexts, they were applied to longish 
sequences of unsigned bytes.  But tuples used as keys and set elements are 
typically much shorter than strings, so results about how they worked in their 
original contexts are largely irrelevant for that reason too.

While lots of non-endcase testing is also needed, at this point I don't have 
any real fear about any of them.

As a sanity check,

        x = (x * mult) | t;

is disastrous for all the endcase tests.  So I believe I really am compiling 
and running the code ;-)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue34751] Hash collisions for tuples

Reply via email to