Re: [Python-Dev] Status of the fix for the hash collision ulnerability

2012-01-15 Thread Hynek Schlawack
Am Sonntag, 15. Januar 2012 um 05:49 schrieb Steven D'Aprano:
> > I don't think anyone doubts that this will break lots of code (at least,
> > the arguments I've heard have been "their code is broken", not "nobody does
> > that").
> 
> I don't know about "lots" of code, but it will break at least one library (or 
> so I'm told):
> 
> http://mail.python.org/pipermail/python-list/2012-January/1286535.html
Sadly, suds is also Python's _only_ usable SOAP library at this moment. :( (on 
top of that, the development is in limbo ATM)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision ulnerability

2012-01-15 Thread Victor Stinner
I don't think that it would be hard to patch this library to use
another hash function. It can implement its own hash function, use
MD5, SHA1, or anything else. hash() is not stable accross Python
versions and 32/64 bit systems.

Victor

2012/1/15 Hynek Schlawack :
> Am Sonntag, 15. Januar 2012 um 05:49 schrieb Steven D'Aprano:
>> > I don't think anyone doubts that this will break lots of code (at least,
>> > the arguments I've heard have been "their code is broken", not "nobody does
>> > that").
>>
>> I don't know about "lots" of code, but it will break at least one library (or
>> so I'm told):
>>
>> http://mail.python.org/pipermail/python-list/2012-January/1286535.html
> Sadly, suds is also Python's _only_ usable SOAP library at this moment. :( 
> (on top of that, the development is in limbo ATM)
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/victor.stinner%40haypocalc.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Stefan Behnel
Terry Reedy, 14.01.2012 06:43:
> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
> 
>> It is perfectly okay to break existing users who had anything depending
>> on ordering of internal hash tables. Their code was already broken.
> 
> Given that the doc says "Return the hash value of the object", I do not
> think we should be so hard-nosed. The above clearly implies that there is
> such a thing as *the* Python hash value for an object. And indeed, that has
> been true across many versions. If we had written "Return a hash value for
> the object, which can vary from run to run", the case would be different.

Just a side note, but I don't think hash() is the right place to document
this. Hashing is a protocol in Python, just like indexing or iteration.
Nothing keeps an object from changing its hash value due to modification,
and that would even be valid in the face of the usual dict lookup
invariants if changes are only applied while the object is not referenced
by any dict. So the guarantees do not depend on the function hash() and may
be even weaker than your above statement.

Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Dinsdale is no more

2012-01-15 Thread Łukasz Langa
Gentlemen, www.python.org is down at the moment.

-- 
Best regards,
Łukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dinsdale is no more

2012-01-15 Thread Eli Bendersky
2012/1/15 Łukasz Langa 

> Gentlemen, www.python.org is down at the moment.
>
>
Well, it's back now: http://www.downforeveryoneorjustme.com/python.org
Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Guido van Rossum
On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel  wrote:

> Terry Reedy, 14.01.2012 06:43:
> > On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
> >
> >> It is perfectly okay to break existing users who had anything depending
> >> on ordering of internal hash tables. Their code was already broken.
> >
> > Given that the doc says "Return the hash value of the object", I do not
> > think we should be so hard-nosed. The above clearly implies that there is
> > such a thing as *the* Python hash value for an object. And indeed, that
> has
> > been true across many versions. If we had written "Return a hash value
> for
> > the object, which can vary from run to run", the case would be different.
>
> Just a side note, but I don't think hash() is the right place to document
> this.


You mean we shouldn't document that the hash() of a string will vary per
run?


> Hashing is a protocol in Python, just like indexing or iteration.
> Nothing keeps an object from changing its hash value due to modification,
>

Eh? There's a huge body of cultural awareness that only immutable objects
should define a hash, implying that the hash remains constant during the
object's lifetime.


> and that would even be valid in the face of the usual dict lookup
> invariants if changes are only applied while the object is not referenced
> by any dict.


And how would you know it isn't?


> So the guarantees do not depend on the function hash() and may
> be even weaker than your above statement.
>

There are no actual guarantees for hash(), but lots of rules for
well-behaved hashes.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Stefan Behnel
Guido van Rossum, 15.01.2012 17:10:
> On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
>> Terry Reedy, 14.01.2012 06:43:
>>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>>>
 It is perfectly okay to break existing users who had anything depending
 on ordering of internal hash tables. Their code was already broken.
>>>
>>> Given that the doc says "Return the hash value of the object", I do not
>>> think we should be so hard-nosed. The above clearly implies that there is
>>> such a thing as *the* Python hash value for an object. And indeed, that
>> has
>>> been true across many versions. If we had written "Return a hash value
>> for
>>> the object, which can vary from run to run", the case would be different.
>>
>> Just a side note, but I don't think hash() is the right place to document
>> this.
> 
> You mean we shouldn't document that the hash() of a string will vary per
> run?

No, I mean that the hash() builtin function is not the right place to
document the behaviour of a string hash. That should go into the string
object documentation.

Although, arguably, it may be worth mentioning in the docs of hash() that,
in general, hash values of builtin types are bound to the lifetime of the
interpreter instance (or entire runtime?) and may change after restarts. I
think that's a reasonable restriction to document that prominently, even if
it will only apply to str for the time being.


>> Hashing is a protocol in Python, just like indexing or iteration.
>> Nothing keeps an object from changing its hash value due to modification,
> 
> Eh? There's a huge body of cultural awareness that only immutable objects
> should define a hash, implying that the hash remains constant during the
> object's lifetime.
> 
>> and that would even be valid in the face of the usual dict lookup
>> invariants if changes are only applied while the object is not referenced
>> by any dict.
> 
> And how would you know it isn't?

Well, if it's an object with a mutable hash then it's up to the application
defining that object to make sure it's used in a sensible way. Immutability
just makes your life easier. I can imagine that an object gets removed from
a dict (say, a cache), modified and then reinserted, and I think it's valid
to allow the modification to have an impact on the hash in this case, in
order to accommodate for any changes to equality comparisons due to the
modification.

That being said, it seems that the Python docs actually consider constant
hashes a requirement rather than a virtue.

http://docs.python.org/glossary.html#term-hashable

"""
An object is hashable if it has a hash value which never changes during its
lifetime (it needs a __hash__() method), and can be compared to other
objects (it needs an __eq__() or __cmp__() method). Hashable objects which
compare equal must have the same hash value.
"""

It also seems to me that the wording "has a hash value which never changes
during its lifetime" makes it pretty clear that the lifetime of the hash
value is not guaranteed to supersede the lifetime of the object (although
that's a rather muddy definition - memory lifetime? or pickle-unpickle as
well?).

However, this entry in the glossary only seems to have appeared with Py2.6,
likely as a result of the abc changes. So it won't help in defending a
change to the hash function.


>> So the guarantees do not depend on the function hash() and may
>> be even weaker than your above statement.
> 
> There are no actual guarantees for hash(), but lots of rules for
> well-behaved hashes.

Absolutely.

Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Gregory P. Smith
On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel  wrote:
>
> It also seems to me that the wording "has a hash value which never changes
> during its lifetime" makes it pretty clear that the lifetime of the hash
> value is not guaranteed to supersede the lifetime of the object (although
> that's a rather muddy definition - memory lifetime? or pickle-unpickle as
> well?).
>

Lifetime to me means of that specific instance of the object. I would not
expect that to survive pickle-unpickle.


> However, this entry in the glossary only seems to have appeared with Py2.6,
> likely as a result of the abc changes. So it won't help in defending a
> change to the hash function.
>

Ugh, I really hope there is no code out there depending on the hash
function being the same across a pickle and unpickle boundary.
 Unfortunately the hash function was last changed in 1996 in
http://hg.python.org/cpython/rev/839f72610ae1 so it is possible someone
somewhere has written code blindly assuming that non-guarantee is true.

-gps
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Antoine Pitrou
On Sun, 15 Jan 2012 17:46:36 +0100
Stefan Behnel  wrote:
> Guido van Rossum, 15.01.2012 17:10:
> > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
> >> Terry Reedy, 14.01.2012 06:43:
> >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
> >>>
>  It is perfectly okay to break existing users who had anything depending
>  on ordering of internal hash tables. Their code was already broken.
> >>>
> >>> Given that the doc says "Return the hash value of the object", I do not
> >>> think we should be so hard-nosed. The above clearly implies that there is
> >>> such a thing as *the* Python hash value for an object. And indeed, that
> >> has
> >>> been true across many versions. If we had written "Return a hash value
> >> for
> >>> the object, which can vary from run to run", the case would be different.
> >>
> >> Just a side note, but I don't think hash() is the right place to document
> >> this.
> > 
> > You mean we shouldn't document that the hash() of a string will vary per
> > run?
> 
> No, I mean that the hash() builtin function is not the right place to
> document the behaviour of a string hash. That should go into the string
> object documentation.

No, but we can document that *any* hash() value can vary between runs
without being specific about which builtin types randomize their
hashes right now.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Guido van Rossum
On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel  wrote:

> Guido van Rossum, 15.01.2012 17:10:
> > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
> >> Terry Reedy, 14.01.2012 06:43:
> >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
> >>>
>  It is perfectly okay to break existing users who had anything
> depending
>  on ordering of internal hash tables. Their code was already broken.
> >>>
> >>> Given that the doc says "Return the hash value of the object", I do not
> >>> think we should be so hard-nosed. The above clearly implies that there
> is
> >>> such a thing as *the* Python hash value for an object. And indeed, that
> >> has
> >>> been true across many versions. If we had written "Return a hash value
> >> for
> >>> the object, which can vary from run to run", the case would be
> different.
> >>
> >> Just a side note, but I don't think hash() is the right place to
> document
> >> this.
> >
> > You mean we shouldn't document that the hash() of a string will vary per
> > run?
>
> No, I mean that the hash() builtin function is not the right place to
> document the behaviour of a string hash. That should go into the string
> object documentation.
>
> Although, arguably, it may be worth mentioning in the docs of hash() that,
> in general, hash values of builtin types are bound to the lifetime of the
> interpreter instance (or entire runtime?) and may change after restarts. I
> think that's a reasonable restriction to document that prominently, even if
> it will only apply to str for the time being.
>

Actually it will apply to a lot more than str, because the hash of
(immutable) compound objects is often derived from the hash of the
constituents, e.g. hash of a tuple.


> >> Hashing is a protocol in Python, just like indexing or iteration.
> >> Nothing keeps an object from changing its hash value due to
> modification,
> >
> > Eh? There's a huge body of cultural awareness that only immutable objects
> > should define a hash, implying that the hash remains constant during the
> > object's lifetime.
> >
> >> and that would even be valid in the face of the usual dict lookup
> >> invariants if changes are only applied while the object is not
> referenced
> >> by any dict.
> >
> > And how would you know it isn't?
>
> Well, if it's an object with a mutable hash then it's up to the application
> defining that object to make sure it's used in a sensible way. Immutability
> just makes your life easier. I can imagine that an object gets removed from
> a dict (say, a cache), modified and then reinserted, and I think it's valid
> to allow the modification to have an impact on the hash in this case, in
> order to accommodate for any changes to equality comparisons due to the
> modification.
>

That could be considered valid only in a very abstract, theoretical,
non-constructive way, since there is no protocol to detect removal from a
dict (and you cannot assume an object is used in only one dict at a time).


> That being said, it seems that the Python docs actually consider constant
> hashes a requirement rather than a virtue.
>
> http://docs.python.org/glossary.html#term-hashable
>
> """
> An object is hashable if it has a hash value which never changes during its
> lifetime (it needs a __hash__() method), and can be compared to other
> objects (it needs an __eq__() or __cmp__() method). Hashable objects which
> compare equal must have the same hash value.
> """
>
> It also seems to me that the wording "has a hash value which never changes
> during its lifetime" makes it pretty clear that the lifetime of the hash
> value is not guaranteed to supersede the lifetime of the object (although
> that's a rather muddy definition - memory lifetime? or pickle-unpickle as
> well?).
>

Across pickle-unpickle it's not considered the same object. Pickling at
best preserves values.

However, this entry in the glossary only seems to have appeared with Py2.6,
> likely as a result of the abc changes. So it won't help in defending a
> change to the hash function.
>
>
> >> So the guarantees do not depend on the function hash() and may
> >> be even weaker than your above statement.
> >
> > There are no actual guarantees for hash(), but lots of rules for
> > well-behaved hashes.
>
> Absolutely.
>

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision ulnerability

2012-01-15 Thread Heiko Wundram

Am 15.01.2012 15:27, schrieb Victor Stinner:

I don't think that it would be hard to patch this library to use
another hash function. It can implement its own hash function, use
MD5, SHA1, or anything else. hash() is not stable accross Python
versions and 32/64 bit systems.


As I wrote in a reply further down: no, it isn't hard to change this 
behaviour (and I find the current caching system, which uses hash() on 
an URL to choose the cache index, braindead to begin with), but, as with 
all other considerations: the current version of the library, with the 
default options, depends on hash() to be stable for the cache to make 
any sense at all (and especially with "generic" schema such as the 
referenced xml.dtd, caching makes a lot of sense, and not being able to 
cache _breaks_ applications as it did mine). This is juts something to 
bear in mind.


--
--- Heiko.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com