Dmitry Rubanovich added the comment: I am looking at the 3.6.1 code. I am not trying to simulate the index lookup as the lookup function would do it.
There is no point in that. The point of the script is to detect collisions among perturbed secondary indexes rather than between primary hashes and secondary. To that end, it applies the perturbed secondary hash function to **all** primary indexes and stores the results. This allows to judge the quality of the secondary hash function. The "old" one was bad because it automatically created collisions by multiplying by zero-divisors of the 2^k ring. The one used in 3.6.1 is better because it doesn't always do that, but it still multiplies by zero-divisors of the ring when h mod 2^k is equal to (h >> 5) mod 2^k. This multiplication by zero-divisors of the 2^k ring is what produces collisions. The script simply counts them. The later iterations of the loop are not very relevant. They will almost certainly produce further collisions, but that's unavoidable. By the way, just as a test, I just changed the value of PERTURB_SHIFT to 1 and, in my proposed solution, it produced lower collision counts, after 3 runs of the loop, in virtually all rings (ie, mod all values of 2^k for k from 8 to 20). Intuitively it makes sense because there is less information loss. But I don't have a formal proof as to why that should be. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30671> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com