Hi, I have an application that would benefit from object instances distinguished by identity being used in dict's and set's. To do this, the __hash__ method must be overridden, the obvious return value being the instance's id.
This works flawlessly in extensive tests on several platforms, and on a couple of different Python versions and implementations. The documentation seems to preclude a second requirement, however. I also want to use the == operator on these objects to mean a natural comparison of values, different from identity, so that two instances comparing equivalent does not imply that they are identical. But the documentation for __hash__ has: "The only required property is that objects which compare equal have the same hash value" Yet it seems something more is going on, because as near as I can tell, it just works, perfectly, every time, despite this requirement. If the the __hash__ method returns the object's id, the __eq__ method of the *never* called when the object is added to a dict. (See examples below.) It would appear that if __hash__ returns the id, then that id is used internally as the key, and since the id is by definition unique, no key collision ever occurs -- at least in every Python implementation I've tried. It also seems that, for a class instance obj, hash( hash( obj ) ) == hash( obj ) hash( id( obj ) ) == id( obj ) These are very strong and useful properties. Where are they documented? It looks like the existing Python built-in containers do exactly what I need, but the documentation suggests that they may not. Taking the documentation at face value, I'm in the position of choosing between abandoning the natural use of operator '==', or of introducing an unfamiliar container implementation. As my package is intended for the use by other people, neither option is attractive. Questions: Is all this apparent behaviour documented somewhere that I have missed? Is there some fatal reason that this approach must never be used, besides the lack of documentary support for it? Is it in fact safe to assume that __eq__ is never called under these circumstances? =============================================================== from __future__ import print_function from sys import stdout class A( object ): # minimal hashable object def __init__( self, v1, v2 ): self.v = ( v1, v2 ) def __eq__( self, b ): #return self.v[0] == b.v[0] and self.v[1] == b.v[1] raise Exception( "CALLED __eq__" ) def __hash__( self ): # instances distinguished by identity return id( self ) # It was suggested that set and dict internally use some representation # of keys that contains less information than the id() value returned by # __hash__, thus causing key collisions, which are resolved by calls to # __eq__. # In that case, one would expect __eq__ to be called eventually if enough # objects were added to a set. I don't see that, though. NINSTANCES = 3000000 # play with this number -- carefully! STATUS_INTERVAL = 100000 def test(): """ hammer the set algorithms """ s = set() instances = [] for i in range( 0, NINSTANCES ): p = A( 1, 0 ) s.add( p ) instances.append( p ) if not i % STATUS_INTERVAL: stdout.write( str( i // STATUS_INTERVAL ) + " " ) stdout.flush() stdout.write( "\n" ) print( "length of set", len( s ) ) print( "number of instances", len( instances ) ) for i in instances: if not i in s: print( "INSTANCE DROPPED OUT!" ) test() -- https://mail.python.org/mailman/listinfo/python-list