Re: [Python-Dev] Status of the fix for the hash collision ulnerability
Am Sonntag, 15. Januar 2012 um 05:49 schrieb Steven D'Aprano: > > I don't think anyone doubts that this will break lots of code (at least, > > the arguments I've heard have been "their code is broken", not "nobody does > > that"). > > I don't know about "lots" of code, but it will break at least one library (or > so I'm told): > > http://mail.python.org/pipermail/python-list/2012-January/1286535.html Sadly, suds is also Python's _only_ usable SOAP library at this moment. :( (on top of that, the development is in limbo ATM) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision ulnerability
I don't think that it would be hard to patch this library to use another hash function. It can implement its own hash function, use MD5, SHA1, or anything else. hash() is not stable accross Python versions and 32/64 bit systems. Victor 2012/1/15 Hynek Schlawack : > Am Sonntag, 15. Januar 2012 um 05:49 schrieb Steven D'Aprano: >> > I don't think anyone doubts that this will break lots of code (at least, >> > the arguments I've heard have been "their code is broken", not "nobody does >> > that"). >> >> I don't know about "lots" of code, but it will break at least one library (or >> so I'm told): >> >> http://mail.python.org/pipermail/python-list/2012-January/1286535.html > Sadly, suds is also Python's _only_ usable SOAP library at this moment. :( > (on top of that, the development is in limbo ATM) > > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/victor.stinner%40haypocalc.com ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
Terry Reedy, 14.01.2012 06:43: > On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > >> It is perfectly okay to break existing users who had anything depending >> on ordering of internal hash tables. Their code was already broken. > > Given that the doc says "Return the hash value of the object", I do not > think we should be so hard-nosed. The above clearly implies that there is > such a thing as *the* Python hash value for an object. And indeed, that has > been true across many versions. If we had written "Return a hash value for > the object, which can vary from run to run", the case would be different. Just a side note, but I don't think hash() is the right place to document this. Hashing is a protocol in Python, just like indexing or iteration. Nothing keeps an object from changing its hash value due to modification, and that would even be valid in the face of the usual dict lookup invariants if changes are only applied while the object is not referenced by any dict. So the guarantees do not depend on the function hash() and may be even weaker than your above statement. Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Dinsdale is no more
Gentlemen, www.python.org is down at the moment. -- Best regards, Łukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dinsdale is no more
2012/1/15 Łukasz Langa > Gentlemen, www.python.org is down at the moment. > > Well, it's back now: http://www.downforeveryoneorjustme.com/python.org Eli ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: > Terry Reedy, 14.01.2012 06:43: > > On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > > > >> It is perfectly okay to break existing users who had anything depending > >> on ordering of internal hash tables. Their code was already broken. > > > > Given that the doc says "Return the hash value of the object", I do not > > think we should be so hard-nosed. The above clearly implies that there is > > such a thing as *the* Python hash value for an object. And indeed, that > has > > been true across many versions. If we had written "Return a hash value > for > > the object, which can vary from run to run", the case would be different. > > Just a side note, but I don't think hash() is the right place to document > this. You mean we shouldn't document that the hash() of a string will vary per run? > Hashing is a protocol in Python, just like indexing or iteration. > Nothing keeps an object from changing its hash value due to modification, > Eh? There's a huge body of cultural awareness that only immutable objects should define a hash, implying that the hash remains constant during the object's lifetime. > and that would even be valid in the face of the usual dict lookup > invariants if changes are only applied while the object is not referenced > by any dict. And how would you know it isn't? > So the guarantees do not depend on the function hash() and may > be even weaker than your above statement. > There are no actual guarantees for hash(), but lots of rules for well-behaved hashes. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
Guido van Rossum, 15.01.2012 17:10: > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: >> Terry Reedy, 14.01.2012 06:43: >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote: >>> It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. >>> >>> Given that the doc says "Return the hash value of the object", I do not >>> think we should be so hard-nosed. The above clearly implies that there is >>> such a thing as *the* Python hash value for an object. And indeed, that >> has >>> been true across many versions. If we had written "Return a hash value >> for >>> the object, which can vary from run to run", the case would be different. >> >> Just a side note, but I don't think hash() is the right place to document >> this. > > You mean we shouldn't document that the hash() of a string will vary per > run? No, I mean that the hash() builtin function is not the right place to document the behaviour of a string hash. That should go into the string object documentation. Although, arguably, it may be worth mentioning in the docs of hash() that, in general, hash values of builtin types are bound to the lifetime of the interpreter instance (or entire runtime?) and may change after restarts. I think that's a reasonable restriction to document that prominently, even if it will only apply to str for the time being. >> Hashing is a protocol in Python, just like indexing or iteration. >> Nothing keeps an object from changing its hash value due to modification, > > Eh? There's a huge body of cultural awareness that only immutable objects > should define a hash, implying that the hash remains constant during the > object's lifetime. > >> and that would even be valid in the face of the usual dict lookup >> invariants if changes are only applied while the object is not referenced >> by any dict. > > And how would you know it isn't? Well, if it's an object with a mutable hash then it's up to the application defining that object to make sure it's used in a sensible way. Immutability just makes your life easier. I can imagine that an object gets removed from a dict (say, a cache), modified and then reinserted, and I think it's valid to allow the modification to have an impact on the hash in this case, in order to accommodate for any changes to equality comparisons due to the modification. That being said, it seems that the Python docs actually consider constant hashes a requirement rather than a virtue. http://docs.python.org/glossary.html#term-hashable """ An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value. """ It also seems to me that the wording "has a hash value which never changes during its lifetime" makes it pretty clear that the lifetime of the hash value is not guaranteed to supersede the lifetime of the object (although that's a rather muddy definition - memory lifetime? or pickle-unpickle as well?). However, this entry in the glossary only seems to have appeared with Py2.6, likely as a result of the abc changes. So it won't help in defending a change to the hash function. >> So the guarantees do not depend on the function hash() and may >> be even weaker than your above statement. > > There are no actual guarantees for hash(), but lots of rules for > well-behaved hashes. Absolutely. Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel wrote: > > It also seems to me that the wording "has a hash value which never changes > during its lifetime" makes it pretty clear that the lifetime of the hash > value is not guaranteed to supersede the lifetime of the object (although > that's a rather muddy definition - memory lifetime? or pickle-unpickle as > well?). > Lifetime to me means of that specific instance of the object. I would not expect that to survive pickle-unpickle. > However, this entry in the glossary only seems to have appeared with Py2.6, > likely as a result of the abc changes. So it won't help in defending a > change to the hash function. > Ugh, I really hope there is no code out there depending on the hash function being the same across a pickle and unpickle boundary. Unfortunately the hash function was last changed in 1996 in http://hg.python.org/cpython/rev/839f72610ae1 so it is possible someone somewhere has written code blindly assuming that non-guarantee is true. -gps ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Sun, 15 Jan 2012 17:46:36 +0100 Stefan Behnel wrote: > Guido van Rossum, 15.01.2012 17:10: > > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: > >> Terry Reedy, 14.01.2012 06:43: > >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > >>> > It is perfectly okay to break existing users who had anything depending > on ordering of internal hash tables. Their code was already broken. > >>> > >>> Given that the doc says "Return the hash value of the object", I do not > >>> think we should be so hard-nosed. The above clearly implies that there is > >>> such a thing as *the* Python hash value for an object. And indeed, that > >> has > >>> been true across many versions. If we had written "Return a hash value > >> for > >>> the object, which can vary from run to run", the case would be different. > >> > >> Just a side note, but I don't think hash() is the right place to document > >> this. > > > > You mean we shouldn't document that the hash() of a string will vary per > > run? > > No, I mean that the hash() builtin function is not the right place to > document the behaviour of a string hash. That should go into the string > object documentation. No, but we can document that *any* hash() value can vary between runs without being specific about which builtin types randomize their hashes right now. Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel wrote: > Guido van Rossum, 15.01.2012 17:10: > > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: > >> Terry Reedy, 14.01.2012 06:43: > >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > >>> > It is perfectly okay to break existing users who had anything > depending > on ordering of internal hash tables. Their code was already broken. > >>> > >>> Given that the doc says "Return the hash value of the object", I do not > >>> think we should be so hard-nosed. The above clearly implies that there > is > >>> such a thing as *the* Python hash value for an object. And indeed, that > >> has > >>> been true across many versions. If we had written "Return a hash value > >> for > >>> the object, which can vary from run to run", the case would be > different. > >> > >> Just a side note, but I don't think hash() is the right place to > document > >> this. > > > > You mean we shouldn't document that the hash() of a string will vary per > > run? > > No, I mean that the hash() builtin function is not the right place to > document the behaviour of a string hash. That should go into the string > object documentation. > > Although, arguably, it may be worth mentioning in the docs of hash() that, > in general, hash values of builtin types are bound to the lifetime of the > interpreter instance (or entire runtime?) and may change after restarts. I > think that's a reasonable restriction to document that prominently, even if > it will only apply to str for the time being. > Actually it will apply to a lot more than str, because the hash of (immutable) compound objects is often derived from the hash of the constituents, e.g. hash of a tuple. > >> Hashing is a protocol in Python, just like indexing or iteration. > >> Nothing keeps an object from changing its hash value due to > modification, > > > > Eh? There's a huge body of cultural awareness that only immutable objects > > should define a hash, implying that the hash remains constant during the > > object's lifetime. > > > >> and that would even be valid in the face of the usual dict lookup > >> invariants if changes are only applied while the object is not > referenced > >> by any dict. > > > > And how would you know it isn't? > > Well, if it's an object with a mutable hash then it's up to the application > defining that object to make sure it's used in a sensible way. Immutability > just makes your life easier. I can imagine that an object gets removed from > a dict (say, a cache), modified and then reinserted, and I think it's valid > to allow the modification to have an impact on the hash in this case, in > order to accommodate for any changes to equality comparisons due to the > modification. > That could be considered valid only in a very abstract, theoretical, non-constructive way, since there is no protocol to detect removal from a dict (and you cannot assume an object is used in only one dict at a time). > That being said, it seems that the Python docs actually consider constant > hashes a requirement rather than a virtue. > > http://docs.python.org/glossary.html#term-hashable > > """ > An object is hashable if it has a hash value which never changes during its > lifetime (it needs a __hash__() method), and can be compared to other > objects (it needs an __eq__() or __cmp__() method). Hashable objects which > compare equal must have the same hash value. > """ > > It also seems to me that the wording "has a hash value which never changes > during its lifetime" makes it pretty clear that the lifetime of the hash > value is not guaranteed to supersede the lifetime of the object (although > that's a rather muddy definition - memory lifetime? or pickle-unpickle as > well?). > Across pickle-unpickle it's not considered the same object. Pickling at best preserves values. However, this entry in the glossary only seems to have appeared with Py2.6, > likely as a result of the abc changes. So it won't help in defending a > change to the hash function. > > > >> So the guarantees do not depend on the function hash() and may > >> be even weaker than your above statement. > > > > There are no actual guarantees for hash(), but lots of rules for > > well-behaved hashes. > > Absolutely. > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision ulnerability
Am 15.01.2012 15:27, schrieb Victor Stinner: I don't think that it would be hard to patch this library to use another hash function. It can implement its own hash function, use MD5, SHA1, or anything else. hash() is not stable accross Python versions and 32/64 bit systems. As I wrote in a reply further down: no, it isn't hard to change this behaviour (and I find the current caching system, which uses hash() on an URL to choose the cache index, braindead to begin with), but, as with all other considerations: the current version of the library, with the default options, depends on hash() to be stable for the cache to make any sense at all (and especially with "generic" schema such as the referenced xml.dtd, caching makes a lot of sense, and not being able to cache _breaks_ applications as it did mine). This is juts something to bear in mind. -- --- Heiko. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
