On Mon, Jul 27, 2020 at 5:42 PM Ethan Furman <[email protected]> wrote:
> Chris Barker wrote:
> > Is this because it's possible, if very
> > unlikely, for ANY hash algorithm to create the same hash for two
> > different inputs? So equality always has to be checked anyway?
>
> snip
For example, if a hash algorithm decided to use short names, then a
> group of people might be sorted like this:
>
> Bob: Bob, Robert
> Chris: Christopher, Christine, Christian, Christina
> Ed: Edmund, Edward, Edwin, Edwina
>
> So if somebody draws a name from a hat:
>
> Christina
>
> You apply the hash to it:
>
> Chris
>
> Ignore the Bob and Ed buckets, then use equality checks on the Chris
> names to find the right one.
>
sure, but know (or assume anyway) that python dicts and sets don't use such
a simple, naive hash algorithm, so in fact, non-equal strings are very
unlikely to have the same hash:
In [42]: hash("Christina")
Out[42]: -8424898463413304204
In [43]: hash("Christopher")
Out[43]: 4404166401429815751
In [44]: hash("Christian")
Out[44]: 1032502133450913307
But a dict always has a LOT fewer buckets than possible hash values, so
clashes within a bucket are not so rare, so equality needs to be checked
always -- which is what I was missing.
And while it wouldn't break anything, having a bunch of non-equal objects
produce the same hash wouldn't break anything, it would break the O(1)
performance of dicts.
Have I got that right?
-CHB
> >> From a practical standpoint, think of dictionaries:
> >
> > (that's the trick here -- you can't "get" this without knowing something
> > about the implementation details of dicts.)
>
> Depends on the person -- I always do better with a concrete application.
>
> >> adding
> >> ------
> >> - objects are sorted into buckets based on their hash
> >> - any one bucket can have several items with equal hashes
> >
> > is this mostly because there are many more possible hashes than buckets?
>
> Yes.
>
> >> - those several items (obviously) will not compare equal
> >
> > So the hash is a fast way to put stuff in buckets, so you only need to
> > compare with the others that end up in the same bucket?
>
> Yes.
>
> >> retrieving
> >> ----------
> >> - get the hash of the object
> >> - find the bucket that would hold that hash
> >> - find the already stored objects with the same hash
> >> - use __eq__ on each one to find the match
> >
> > So here's my question: if there is only one object in that bucket, is
> > __eq__ checked anyway?
>
> Yes -- just because it has the same hash does not mean it's equal.
>
> > So what happens when there is no __eq__?The object can still be hashable
> > -- I guess that's because there IS an __eq__ -- it defaults to an id
> > check, yes?
>
> Yes.
>
> The default hash, I believe, also defaults to the object id -- so, by
> default, objects are hashable and compare equal only to themselves.
>
> --
> ~Ethan~
> _______________________________________________
> Python-ideas mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/XPUXOSK7WHXV7LRB7H3I4S42JQ2WXQU3/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
Christopher Barker, PhD
Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/YJNU4QPGWJUFYIL33PSI2NPYAC2BRLI5/
Code of Conduct: http://python.org/psf/codeofconduct/