On Fri, Aug 11, 2017 at 12:45 AM, Steve D'Aprano <steve+pyt...@pearwood.info> wrote:
> The C code says: > >> /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid >> excessive hash collisions for dicts and sets */ > > which I think agrees with my comment: using the id() itself would put too many > objects in the same bucket (i.e. too many collisions). > > >> If that were the problem it wouldn't be solved by the current approach: >> >>>>> sample = [object() for _ in range(10)] >>>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])] >> [1, 1, 1, 1, 1, 1, 1, 1, 1] A difference of 1 in a hash is usually going to mean dropping something into the next bucket. A difference of 4, 8, or 16 would mean that a tiny dictionary (which has 8 slots and thus uses modulo-8) would have everything on the same slot. > > So my money is on object() being anomalous: because it is so small, the hashes > end up so similar. For "typical" classes, the hash function does a much better > job of mixing the hash values up. > And this is also possible, but most likely the difference would simply widen; small dictionaries would still have all objects landing in the zeroth bucket. Hence rotating away the low bits. ChrisA -- https://mail.python.org/mailman/listinfo/python-list