Dmitry Rubanovich added the comment:

I am looking at the 3.6.1 code.  I am not trying to simulate the index lookup 
as the lookup function would do it.  

There is no point in that.  The point of the script is to detect collisions 
among perturbed secondary indexes rather than between primary hashes and 
secondary.  To that end, it applies the perturbed secondary hash function to 
**all** primary indexes and stores the results.  This allows to judge the 
quality of the secondary hash function.  The "old" one was bad because it 
automatically created collisions by multiplying by zero-divisors of the 2^k 
ring.  The one used in 3.6.1 is better because it  doesn't always do that, but 
it still multiplies by zero-divisors of the ring when h mod 2^k is equal to (h 
>> 5) mod 2^k.  This multiplication by zero-divisors of the 2^k ring is what 
produces collisions.  The script simply counts them.  

The later iterations of the loop are not very relevant.  They will almost 
certainly produce further collisions, but that's unavoidable.  

By the way, just as a test, I just changed the value of PERTURB_SHIFT to 1 and, 
in my proposed solution, it produced lower collision counts, after 3 runs of 
the loop, in virtually all rings (ie, mod all values of 2^k for k from 8 to 
20).  Intuitively it makes sense because there is less information loss.  But I 
don't have a formal proof as to why that should be.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30671>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to