Tim Peters <t...@python.org> added the comment:
[Tim] > Perhaps worth noting that FNV-1a works great if used as > _intended_: on a stream of unsigned bytes. > ... > > Py_uhash_t t = (Py_uhash_t)y; > for (int i = 0; i < sizeof(t); ++i) { > x = (x ^ (t & 0xff)) * (Py_uhash_t)1099511628211ULL; > t >>= 8; > } And that's all - no hacks for nested tuples, no hacks for mixed-sign ints, just 100% pure FNV-1a. So, just for fun, how about 100% pure Bernstein 33A or 33X? Those replace the 3rd line above with, respectively, x = (x * 33) + (t & 0xff); // 33A or x = (x * 33) ^ (t & 0xff); // 33X And those _also_ work great: no collisions at all across my collection, except for the new tuple test, where they suffer 10 and 8 collisions respectively (note that the new test retains only the 32 least-significant hash bits). Those are remarkable partly because multiplying by 33 is a weak permutation on its own: it's just a left shift (by 5) and an add. And while you can find odd multipliers with lower order (mod 2**N) than 33, you have to work to find one - it's exceptionally poor in this respect. For example, pow(33, 8, 256) == 1, but half of the odd multipliers in range(256) have order 64 (mod 256). Only 16 of 'em have order <= 8. Then again, we're applying that weak permutation 8 times per 64-bit input here, and folding in a fresh byte each time. It seems that, overall, that's "stronger" than applying a stronger - but still easily spelled - permutation just one or two times to a 64-bit input. The only thing I've tried using 64-bit ops that was comparably robust across these tests used the frozenset hash's permutation on the inputs: Py_uhash_t t = (Py_uhash_t)y; t = ((t ^ 89869747UL) ^ (t << 16)) * 3644798167UL; x = (x * mult) + t; That was quite insensitive to the choice of `mult`. For example, in this instead: Py_uhash_t t = (Py_uhash_t)y; t ^= t << 1; x = (x * mult) + t; the original tuple test had 24 collision and the new test 6 using the current multiplier; changing to the FNV-1a 32-bit multiplier, those zoomed to 857 and 44; then to the FNV-1a 64-bit multiplier, zoomed again to 477 and 2027; then to the "random" 1484268081, way down to 0 and 25; and finally to the "random" 5517301966916670289, down to 0 and 4. Which didn't inspire confidence ;-) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34751> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com