Tim Peters <t...@python.org> added the comment:
> Suppose that there is a hash collision, say hash((3, 3)) == > hash((-3, -3)) and you change the hashing algorithm to fix > this collision. There are _two_ hash functions at play in that collision: the tuple hash function, and the integer hash function. Regardless of whether the _tuple_ hash function does t ^= t << shift; or t += t >> shift; or anything else involving just `t`, that only directly affects the result of the _int_ hash function. Whose low order bits cannot be improved in general - they're already optimally spread out in the most important cases. > If that change does NOT affect the lower N bits, > then you still have a collision hash((3, 3)) % 2**N == > hash((-3, -3)) % 2**N. This is relevant because the dict > implementation starts by taking the lower bits first. So > ideally we want to avoid hash collisions in the lower > bits too. Which the int hash on its own already does a superb job of doing in the most important cases. If you feel you just have to mess with low-order bits, do it instead on the _tuple_ hash intermediate results, not on its inputs. Like x += x >> 16; after t is folded in to x. It's x you care about directly, not t. Damaging desirable properties in t to _indirectly_ gain something in x is too clever by half. Mucking with t to avoid the nested-tuple catastrophes is worth it, but I prefer to do that with as little damage to t's desirable properties as reasonably possible. > This may also be a reason to avoid the standard FNV > multipliers: the 64-bit FNV multiplier was chosen with > the full 64-bit hash range in mind. It was never meant > to work well when truncated to some lower N bits. Do you believe any other multiplier would work better toward that end? For any odd multiplier M, the last k bits of i*M are determined solely by the last k bits of i, and [(last k bits of i*M) for i in range(anything, anything + 2**k)] is a permutation of range(2**k). The high order bits of i have nothing to do with it, and any odd multiplier permutes the lower k bits over any stretch of 2**k consecutive multiplicands. Extracting low-order bits is intended to be done by "xor folding" in FNV, and I _expect_ the same would be prudent for any other multiplier: https://tools.ietf.org/html/draft-eastlake-fnv-15#page-7 The dict and set lookup functions already do something stronger than xor folding to bring high bits into play, but do so incrementally as the probe sequence unfolds. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34751> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com