On Wed, 16 Mar 2022 at 14:00, Cameron Simpson <c...@cskk.id.au> wrote: > > On 16Mar2022 10:57, Chris Angelico <ros...@gmail.com> wrote: > >> Is it sensible to compute the hash only from the immutable parts? > >> Bearing in mind that usually you need an equality function as well and > >> it may have the same stability issues. > > > >My understanding - and I'm sure Marco will correct me if I'm wrong > >here - is that this behaves like a tuple: if it contains nothing but > >hashable objects, it is itself hashable, but if it contains anything > >unhashable, the entire tuple isn't hashable. > > A significant difference is that tuples have no keys, unlike a dict. > > A hash does not have to hash all the internal state, ony the relevant > state, and not even all of that. The objective of the hash is twofold to > my mind: > - that "equal" objects (in the `==` sense) have the same hash, so that > they hash to the same backet in dicts and can therefore be found > - that hash values are widely distributed, to maximise the evenness of > the object distribution in buckets > > For dicts to work, the former has to hold.
Python only demands the first one. And Python's own integers hash to themselves, which isn't what I'd call "widely distributed", but which works well in practice. > The latter has more flexibility. A dict has keys. If the dicts are quite > varied (sets of tags for example), it may be effective to just hash the > keys. But if the dict keys are similar (labelled CSV-rows-as-dicts, or > likewise with database rows) this will go badly because the hashes will > all (or maybe mostly) collide. The general recommendation is to use all the same data for hashing as you use for equality checking. What's the point of just hashing the keys, if the values also contribute to equality? I'd just keep it simple, and hash both. ChrisA -- https://mail.python.org/mailman/listinfo/python-list