On Fri, Aug 11, 2017 at 2:41 AM, Steve D'Aprano <steve+pyt...@pearwood.info> wrote: > On Fri, 11 Aug 2017 12:58 am, Chris Angelico wrote: > >> On Fri, Aug 11, 2017 at 12:45 AM, Steve D'Aprano >> <steve+pyt...@pearwood.info> wrote: >> >>> The C code says: >>> >>>> /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid >>>> excessive hash collisions for dicts and sets */ >>> >>> which I think agrees with my comment: using the id() itself would put too >>> many objects in the same bucket (i.e. too many collisions). >>> >>> >>>> If that were the problem it wouldn't be solved by the current approach: >>>> >>>>>>> sample = [object() for _ in range(10)] >>>>>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])] >>>> [1, 1, 1, 1, 1, 1, 1, 1, 1] >> >> A difference of 1 in a hash is usually going to mean dropping >> something into the next bucket. A difference of 4, 8, or 16 would mean >> that a tiny dictionary (which has 8 slots and thus uses modulo-8) >> would have everything on the same slot. > > Um... yes? And how does that relate to the comment given in the source code? > > "bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid excessive hash > collisions for dicts and sets" > Are we in agreement so far?
Yes, we're in agreement. It may have been unclear from my quoting style, but the main point I was disagreeing with was this: >>>> If that were the problem it wouldn't be solved by the current approach: >>>> >>>>>>> sample = [object() for _ in range(10)] >>>>>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])] >>>> [1, 1, 1, 1, 1, 1, 1, 1, 1] Incrementing hashes by 1 usually will put things into successive buckets. Incrementing by 8 or 16 will usually put things into the same bucket. Sorry for the confusion. ChrisA -- https://mail.python.org/mailman/listinfo/python-list