Tim Peters added the comment:

Dmitry, I suggest you spend more of the time you give to thinking to writing 
code instead ;-)  But, really, that's the easiest & surest way to discover for 
yourself where your analysis is going off in the weeds.

For example, issue 28201 was both simpler and worse than your thoughts about 
it:  x and y collide initially if and only if x = y mod 2**k.  If they do 
collide, then _regardless_ of the value of N it's necessarily the case that N*x 
+ x + 1 = N*y + y + 1 mod 2**k too.  That is, the second probes _always_ 
collided (not just half the time) in the case of a first-probe collision, and 
this would be so regardless of whether we were multiplying by 5, 4, 1, 0, or 
31459265358.  That was an unfortunate "off by 1" kind of mistake in the way the 
code was written.  It had nothing to do with, e.g., zero divisors.

After the obvious fix was applied, there's very little that can be said in 
reality.  Because, after the fix, higher-order bits of the hash codes - which 
had nothing at all to do with the initial x = y mod 2**k collision - are added 
in to the second probe values.  There's no good reason I can see to calculate 
what happens if those never-before-considered bits _happen_ to be all zeroes.  
They might be - but so what?  They might be any pair of values in the cross 
product of range(2**5) with itself.  There's nothing especially interesting 
about the (0, 0) pair.

That's why you're getting pushback:  your analysis hasn't made sense to me, and 
the things you're calculating still don't appear to have much of anything to do 
with how collisions are actually resolved.  To the contrary, so long as you're 
ignoring the higher-order hash code bits (about which we can infer _nothing_ 
from that the first probe collided), you're ignoring the _heart_ of the 
collision resolution strategy.

Some other things writing code would make immediately obvious:

- Hashes of strings have nothing to do with pointers or memory addresses.  They 
have solely to do with the characters the string contains, and all values in 
range(256) show up as the last byte of string hashes.

- While hashes of pointers do, the internal `_Py_HashPointer()` rotates 
addresses right by 4 bits so that the predictable trailing zero bits have no 
effect on the first probe.  For example,

>>> a = object()
>>> id(a)    # the address
1583819629360
>>> hex(_)
'0x170c301a330'
>>> hash(a)  # the address is rotated so `0x3` is the last nibble
98988726835
>>> hex(_)
'0x170c301a33'

Because of all that, I'm going to close this.  But if you have an idea that 
actually speeds things, please post a patch and reopen it!  While it's true 
that I don't expect to see an actual improvement, clocks usually beat arguments 
;-)

----------
resolution:  -> rejected
stage:  -> resolved
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30671>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to