On 12/27/2016 07:13 PM, [email protected] wrote:
According to the docs [6]:""" it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple. Example: def __hash__(self): return hash((self.name, self.nick, self.color)) """ Applying this advice to the use cases above would require creating an arbitrarily large tuple in memory before passing it to hash(), which is then just thrown away. It would be preferable if there were a way to pass multiple values to hash() in a streaming fashion, such that the overall hash were computed incrementally, without building up a large object in memory first.
Part of the reason for creating __hash__ like above is that: - it's simple - it's reliable However, it's not the only way to have a hash algorithm that works; in fact, the beginning of the sentence you quoted says:
The only required property is that objects which compare equal have the same hash value;
In other words, objects that do not compare equal can also have the same hash value (although too much of that will reduce the efficiency of Python's containers).
(ii) priming the overall hash value with some class-specific initial value, so that if an instance of a different type of collection, which comprised the same items but which compared unequal, were to compute its hash value out of the same constituent items, we make sure our hash value differs.
This is unnecessary: hashes are compared first as a way to weed out impossible matches, but when the hashes are the same an actual __eq__ test is still done [7]. -- ~Ethan~ [6] https://docs.python.org/3/reference/datamodel.html#object.__hash__ [7] some test code to prove above points: --- 8< ------------------------------------------------------------ from unittest import main, TestCase class Eggs(object): def __init__(self, value): self.value = value def __hash__(self): return hash(self.value) def __eq__(self, other): if isinstance(other, self.__class__): return self.value == other.value return NotImplemented class Spam(object): def __init__(self, value): self.value = value def __hash__(self): return hash(self.value) def __eq__(self, other): if isinstance(other, self.__class__): return self.value == other.value return NotImplemented e10 = Eggs(1) e20 = Eggs(2) e11 = Eggs(1) e21 = Eggs(2) s10 = Spam(1) s20 = Spam(2) s11 = Spam(1) s21 = Spam(2) bag = {} bag[e10] = 1 bag[s10] = 2 bag[e20] = 3 bag[s20] = 4 class TestEqualityAndHashing(TestCase): def test_equal(self): # same class, same value --> equal self.assertEqual(e10, e11) self.assertEqual(e20, e21) self.assertEqual(s10, s11) self.assertEqual(s20, s21) def test_not_equal(self): # different class, same value --> not equal self.assertEqual(e10.value, s10.value) self.assertNotEqual(e10, s10) self.assertEqual(e20.value, s20.value) self.assertNotEqual(e20, s20) def test_same_hash(self): # same class, same value, same hash self.assertEqual(hash(e10), hash(e11)) self.assertEqual(hash(e20), hash(e21)) self.assertEqual(hash(s10), hash(s11)) self.assertEqual(hash(s20), hash(s21)) # different class, same value, same hash self.assertEqual(hash(e10), hash(s10)) self.assertEqual(hash(e11), hash(s11)) self.assertEqual(hash(e20), hash(s20)) self.assertEqual(hash(e21), hash(s21)) def test_as_key(self): # different objects from different classes with same hash should still be distinct self.assertEqual(len(bag), 4) self.assertEqual(bag[e10], 1) self.assertEqual(bag[s10], 2) self.assertEqual(bag[e20], 3) self.assertEqual(bag[s20], 4) # different objects from same classes with same hash should not be distinct self.assertEqual(bag[e11], 1) self.assertEqual(bag[s11], 2) self.assertEqual(bag[e21], 3) self.assertEqual(bag[s21], 4) main() --- 8< ------------------------------------------------------------ _______________________________________________ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
