New submission from Kevin Norris:

The documentation for __hash__ contains this text:

"The only required property is that objects which compare equal have the same 
hash value; it is advised to somehow mix together (e.g. using exclusive or) the 
hash values for the components of the object that also play a part in 
comparison of objects."

The recommendation of "using exclusive or" is likely to result in programmers 
naively doing this:

def __hash__(self):
    return hash(self.thing1) ^ hash(self.thing2) ^ hash(self.thing3)

In the event that (say) self.thing1 and self.thing2 have almost or exactly the 
same hash (with "almost" referring to bitwise differences rather than integral 
distance), this wipes out most or all of the entropy from both values and 
greatly increases the likelihood of hash collisions.  Indeed, Python's own 
tuple type does not do this (while it does use XOR, it also does some other 
math to ensure the bits are as mixed up as is practical).[1]

Because the correct algorithm is both nontrivial to implement and already 
exists in the tuple type's __hash__, I propose that the documentation be 
updated to recommend something like the following:

def __hash__(self):
    return hash((self.thing1, self.thing2, self.thing3))

One possible wording:

"The only required property is that objects which compare equal have the same 
hash value; it is advised to mix together the hash values of the components of 
the object that also play a part in comparison of objects by packing them into 
a tuple and hashing the tuple: [code example]"

[1]: https://hg.python.org/cpython/file/fca5c4a63251/Objects/tupleobject.c#l348

----------
assignee: docs@python
components: Documentation
messages: 278229
nosy: Kevin.Norris, docs@python
priority: normal
severity: normal
status: open
title: __hash__ documentation recommends naive XOR to combine but this is 
suboptimal
type: performance
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28383>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to