[issue34751] Hash collisions for tuples

Tim Peters Sat, 06 Oct 2018 15:24:44 -0700


Tim Peters <t...@python.org> added the comment:


Thinking about the way xxHash works prompted me to try this initialization 
change:

    x = 0x345678UL + (Py_uhash_t)len; // adding in the length is new

That was a pure win in my tests, slashing the number of collisions in the new 
tuple test, 32-bit build, from 51 to 17.  In the 64-bit build, cut from 11 to 
10, but when looking at the high 32 hash bits instead from 27 to 15.  There 
were also small improvements in other 32-bit build collision stats.

Full-blown xxHash (& SeaHash, and murmurhash, and ...) also fold in the length, 
but not inside their core loops.  It's useful info!

But they fold it in _after_ their core loops, not before.  They're apparently 
worried about this:  if their internal state ever reaches 0, that's a fixed 
point so long as all remaining incoming data is zeroes (in Python, e.g., 
picture adding one or more trailing integer or float zeroes or ..., which hash 
to 0).  Folding in the length at the end snaps it slightly out of zeroland.

I don't really care about that.  Folding in the length at the start quickly 
leads to large differences across iterations for tuples of different lengths 
(thanks to repeated mulitiplication and "propagate right" steps), which is 
especially helpful in things like the nested tuple tests where there's almost 
as much variation in tuple lengths as there is in the few little integers 
bottom-level tuples contain.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue34751] Hash collisions for tuples

Reply via email to