On Fri, Aug 11, 2017 at 7:17 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: > That's interesting, but suggests there's something weird (~ suboptimal) > going on with CPython's scrambling algorithm. Also, your performance > test might yield completely different results on other Python > implementations. > > Apart from uniqueness, there's no particular reason to prefer one > __hash__() value over another as long as the interesting bits are inside > the CPU's simple integer range.
Not true. Every time you probe a new location [1], you have to fetch more data from RAM. That delays you significantly. An ideal hashing system is going to give a high probability of giving you an empty bucket on the first try, to minimize the number of main memory accesses required. CPython's scrambling algorithm means that, even when its first try doesn't succeed, there's a good chance that its second will succeed. But that doesn't change the fact that you want the first one to succeed as often as possible. ChrisA [1] Unless it's within the same cache line, which is eight pointers wide on my CPU. Highly unlikely when working with 100,000 pointers. -- https://mail.python.org/mailman/listinfo/python-list