Tim Peters <t...@python.org> added the comment:

[Tim]
> Perhaps worth noting that FNV-1a works great if used as
> _intended_:  on a stream of unsigned bytes.
> ...
>
>    Py_uhash_t t = (Py_uhash_t)y;
>    for (int i = 0; i < sizeof(t); ++i) {
>        x = (x ^ (t & 0xff)) * (Py_uhash_t)1099511628211ULL;
>        t >>= 8;
>    }

And that's all - no hacks for nested tuples, no hacks for mixed-sign ints, just 
100% pure FNV-1a.

So, just for fun, how about 100% pure Bernstein 33A or 33X?  Those replace the 
3rd line above with, respectively,

        x = (x * 33) + (t & 0xff); // 33A

or

        x = (x * 33) ^ (t & 0xff); // 33X


And those _also_ work great:  no collisions at all across my collection, except 
for the new tuple test, where they suffer 10 and 8 collisions respectively 
(note that the new test retains only the 32 least-significant hash bits).

Those are remarkable partly because multiplying by 33 is a weak permutation on 
its own:  it's just a left shift (by 5) and an add.  And while you can find odd 
multipliers with lower order (mod 2**N) than 33, you have to work to find one - 
it's exceptionally poor in this respect.  For example, pow(33, 8, 256) == 1, 
but half of the odd multipliers in range(256) have order 64 (mod 256).  Only 16 
of 'em have order <= 8.

Then again, we're applying that weak permutation 8 times per 64-bit input here, 
and folding in a fresh byte each time.  It seems that, overall, that's 
"stronger" than applying a stronger - but still easily spelled - permutation 
just one or two times to a 64-bit input.

The only thing I've tried using 64-bit ops that was comparably robust across 
these tests used the frozenset hash's permutation on the inputs:

    Py_uhash_t t = (Py_uhash_t)y;
    t = ((t ^ 89869747UL) ^ (t << 16)) * 3644798167UL;
    x = (x * mult) + t;

That was quite insensitive to the choice of `mult`.  For example, in this 
instead:

    Py_uhash_t t = (Py_uhash_t)y;
    t ^= t << 1;
    x = (x * mult) + t;

the original tuple test had 24 collision and the new test 6 using the current 
multiplier; changing to the FNV-1a 32-bit multiplier, those zoomed to 857 and 
44; then to the FNV-1a 64-bit multiplier, zoomed again to 477 and 2027; then to 
the "random" 1484268081, way down to 0 and 25; and finally to the "random" 
5517301966916670289, down to 0 and 4.  Which didn't inspire confidence ;-)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to