Steven D'Aprano wrote: > On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote: > >> Good point! A very good __hash__() implementation is: >> >> def __hash__(self): >> return id(self) >> >> In fact, I didn't know Python (kinda) did this by default already. I >> can't find that information in the definition of object.__hash__(): > > > Hmmm... using id() as the hash would be a terrible hash function. Objects
It's actually id(self) >> 4 (almost, see C code below), to account for memory alignment. >>> obj = object() >>> hex(id(obj)) '0x7f1f058070b0' >>> hex(hash(obj)) '0x7f1f058070b' >>> sample = (object() for _ in range(10)) >>> all(id(obj) >> 4 == hash(obj) for obj in sample) True > would fall into similar buckets if they were created at similar times, > regardless of their value, rather than being well distributed. If that were the problem it wouldn't be solved by the current approach: >>> sample = [object() for _ in range(10)] >>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])] [1, 1, 1, 1, 1, 1, 1, 1, 1] Py_hash_t _Py_HashPointer(void *p) { Py_hash_t x; size_t y = (size_t)p; /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid excessive hash collisions for dicts and sets */ y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4)); x = (Py_hash_t)y; if (x == -1) x = -2; return x; } -- https://mail.python.org/mailman/listinfo/python-list