[Python-Dev] Hash randomization for which types?

2016-02-16 Thread Christoph Groth

Hello,

Recent Python versions randomize the hashes of str, bytes and 
datetime objects.  I suppose that the choice of these three types 
is the result of a compromise.  Has this been discussed somewhere 
publicly?


I'm not a web programmer, but don't web applications also use 
dictionaries that are indexed by, say, tuples of integers?


Just curious...

Thanks,
Christoph


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash randomization for which types?

2016-02-16 Thread Glenn Linderman

On 2/16/2016 1:48 AM, Christoph Groth wrote:

Hello,

Recent Python versions randomize the hashes of str, bytes and datetime 
objects.  I suppose that the choice of these three types is the result 
of a compromise.  Has this been discussed somewhere publicly?


Search archives of this list... it was discussed at length.

I'm not a web programmer, but don't web applications also use 
dictionaries that are indexed by, say, tuples of integers?


Sure, and that is the biggest part of the reason they were randomized.  
I think hashes of all types have been randomized, not _just_ the list 
you mentioned.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash randomization for which types?

2016-02-16 Thread Steven D'Aprano
On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote:
> On 2/16/2016 1:48 AM, Christoph Groth wrote:
> >Hello,
> >
> >Recent Python versions randomize the hashes of str, bytes and datetime 
> >objects.  I suppose that the choice of these three types is the result 
> >of a compromise.  Has this been discussed somewhere publicly?
> 
> Search archives of this list... it was discussed at length.

There's a lot of discussion on the mailing list. I think that this is 
the very start of it, in Dec 2011:

https://mail.python.org/pipermail/python-dev/2011-December/115116.html

and continuing into 2012, for example:

https://mail.python.org/pipermail/python-dev/2012-January/115577.html
https://mail.python.org/pipermail/python-dev/2012-January/115690.html

and a LOT more, spread over many different threads and subject lines.

You should also read the issue on the bug tracker:

http://bugs.python.org/issue13703


My recollection is that it was decided that only strings and bytes need 
to have their hashes randomized, because only strings and bytes can be 
used directly from user-input without first having a conversion step 
with likely input range validation. In addition, changing the hash for 
ints would break too much code for too little benefit: unlike strings, 
where hash collision attacks on web apps are proven and easy, hash 
collision attacks based on ints are more difficult and rare.

See also the comment here:

http://bugs.python.org/issue13703#msg151847



> >I'm not a web programmer, but don't web applications also use 
> >dictionaries that are indexed by, say, tuples of integers?
> 
> Sure, and that is the biggest part of the reason they were randomized.  

But they aren't, as far as I can see:

[steve@ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
1071302475
[steve@ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
1071302475

Web apps can use dicts indexed by anything that they like, but unless 
there is an actual attack, what does it matter? Guido makes a good point 
about security here:

https://mail.python.org/pipermail/python-dev/2013-October/129181.html



> I think hashes of all types have been randomized, not _just_ the list 
> you mentioned.

I'm pretty sure that's not actually the case. Using 3.6 from the repo 
(admittedly not fully up to date though), I can see hash randomization 
working for strings:

[steve@ando 3.6]$ ./python -c "print(hash('abc'))"
11601873
[steve@ando 3.6]$ ./python -c "print(hash('abc'))"
-2009889747

but not for ints:

[steve@ando 3.6]$ ./python -c "print(hash(42))"
42
[steve@ando 3.6]$ ./python -c "print(hash(42))"
42


which agrees with my recollection that only strings and bytes would be 
randomized.



-- 
Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash randomization for which types?

2016-02-16 Thread Stephen J. Turnbull
Glenn Linderman writes:

 > I think hashes of all types have been randomized, not _just_ the list 
 > you mentioned.

Yes.  There's only one hash function used, which operates on byte
streams IIRC.  That function now has a random offset.  The details of
hashing each type are in the serializations to byte streams.



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash randomization for which types?

2016-02-16 Thread Shell Xu
I thought you are right. Here is the source code in python 2.7.11:

long
PyObject_Hash(PyObject *v)
{
PyTypeObject *tp = v->ob_type;
if (tp->tp_hash != NULL)
return (*tp->tp_hash)(v);
/* To keep to the general practice that inheriting
 * solely from object in C code should work without
 * an explicit call to PyType_Ready, we implicitly call
 * PyType_Ready here and then check the tp_hash slot again
 */
if (tp->tp_dict == NULL) {
if (PyType_Ready(tp) < 0)
return -1;
if (tp->tp_hash != NULL)
return (*tp->tp_hash)(v);
}
if (tp->tp_compare == NULL && RICHCOMPARE(tp) == NULL) {
return _Py_HashPointer(v); /* Use address as hash value */
}
/* If there's a cmp but no hash defined, the object can't be hashed */
return PyObject_HashNotImplemented(v);
}

If object has hash function, it will be used. If not, _Py_HashPointer will
be used. Which _Py_HashSecret are not used.
And I checked reference of _Py_HashSecret. Only bufferobject, unicodeobject
and stringobject use _Py_HashSecret.

On Wed, Feb 17, 2016 at 9:54 AM, Steven D'Aprano 
wrote:

> On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote:
> > On 2/16/2016 1:48 AM, Christoph Groth wrote:
> > >Hello,
> > >
> > >Recent Python versions randomize the hashes of str, bytes and datetime
> > >objects.  I suppose that the choice of these three types is the result
> > >of a compromise.  Has this been discussed somewhere publicly?
> >
> > Search archives of this list... it was discussed at length.
>
> There's a lot of discussion on the mailing list. I think that this is
> the very start of it, in Dec 2011:
>
> https://mail.python.org/pipermail/python-dev/2011-December/115116.html
>
> and continuing into 2012, for example:
>
> https://mail.python.org/pipermail/python-dev/2012-January/115577.html
> https://mail.python.org/pipermail/python-dev/2012-January/115690.html
>
> and a LOT more, spread over many different threads and subject lines.
>
> You should also read the issue on the bug tracker:
>
> http://bugs.python.org/issue13703
>
>
> My recollection is that it was decided that only strings and bytes need
> to have their hashes randomized, because only strings and bytes can be
> used directly from user-input without first having a conversion step
> with likely input range validation. In addition, changing the hash for
> ints would break too much code for too little benefit: unlike strings,
> where hash collision attacks on web apps are proven and easy, hash
> collision attacks based on ints are more difficult and rare.
>
> See also the comment here:
>
> http://bugs.python.org/issue13703#msg151847
>
>
>
> > >I'm not a web programmer, but don't web applications also use
> > >dictionaries that are indexed by, say, tuples of integers?
> >
> > Sure, and that is the biggest part of the reason they were randomized.
>
> But they aren't, as far as I can see:
>
> [steve@ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
> 1071302475
> [steve@ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
> 1071302475
>
> Web apps can use dicts indexed by anything that they like, but unless
> there is an actual attack, what does it matter? Guido makes a good point
> about security here:
>
> https://mail.python.org/pipermail/python-dev/2013-October/129181.html
>
>
>
> > I think hashes of all types have been randomized, not _just_ the list
> > you mentioned.
>
> I'm pretty sure that's not actually the case. Using 3.6 from the repo
> (admittedly not fully up to date though), I can see hash randomization
> working for strings:
>
> [steve@ando 3.6]$ ./python -c "print(hash('abc'))"
> 11601873
> [steve@ando 3.6]$ ./python -c "print(hash('abc'))"
> -2009889747
>
> but not for ints:
>
> [steve@ando 3.6]$ ./python -c "print(hash(42))"
> 42
> [steve@ando 3.6]$ ./python -c "print(hash(42))"
> 42
>
>
> which agrees with my recollection that only strings and bytes would be
> randomized.
>
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/shell909090%40gmail.com
>



-- 
彼節者有間,而刀刃者無厚;以無厚入有間,恢恢乎其於游刃必有餘地矣。
blog: http://shell909090.org/blog/
twitter: @shell909090 
about.me: http://about.me/shell909090
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True)

2016-02-16 Thread Mike Kaplinskiy
Hey folks,

I hope this is the right list for this sort of thing (python-ideas seemed
more far-fetched).

For some context: there is currently a issue with pex that causes
sys.modules lookups to stop working for __main__. In turns this makes
unittest.run() & pkg_resources.resource_* fail. The root cause is that pex
uses runpy.run_module with alter_sys=False. The fix should be to just pass
alter_sys=True, but that changes sys.argv[0] and various existing pex files
depend on that being the pex file. You can read more at
https://github.com/pantsbuild/pex/pull/211 .

Conservatively, I'd like to propose adding an argument to disable this
behavior. The current behavior breaks a somewhat reasonable invariant that
you can restart your program via `os.execv([sys.executable] + sys.argv)`.
Moreover it might be user-friendly to add a `argv=sys.argv[1:]` argument to
set & restore the full arguments to the module, where `argv=None` disables
argv[0] switching.

What do you think?

Mike.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True)

2016-02-16 Thread Gregory P. Smith
On Tue, Feb 16, 2016 at 9:00 PM Mike Kaplinskiy 
wrote:

> Hey folks,
>
> I hope this is the right list for this sort of thing (python-ideas seemed
> more far-fetched).
>
> For some context: there is currently a issue with pex that causes
> sys.modules lookups to stop working for __main__. In turns this makes
> unittest.run() & pkg_resources.resource_* fail. The root cause is that pex
> uses runpy.run_module with alter_sys=False. The fix should be to just pass
> alter_sys=True, but that changes sys.argv[0] and various existing pex files
> depend on that being the pex file. You can read more at
> https://github.com/pantsbuild/pex/pull/211 .
>
> Conservatively, I'd like to propose adding an argument to disable this
> behavior. The current behavior breaks a somewhat reasonable invariant that
> you can restart your program via `os.execv([sys.executable] + sys.argv)`.
>

I don't know enough about pex to really dig into what it is trying to do so
this is tangential to answering your question but:

sys.executable may be None. ex: If you're an embedded Python interpreter
there is no Python executable. It cannot be blindly used re-execute the
current process.

sys.argv represents the C main() argv array. Your inclination (in the
linked to bug above) to leave sys.argv[0] alone is a good one.

-gps

Moreover it might be user-friendly to add a `argv=sys.argv[1:]` argument to
> set & restore the full arguments to the module, where `argv=None` disables
> argv[0] switching.
>
> What do you think?
>
> Mike.
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash randomization for which types?

2016-02-16 Thread Maciej Fijalkowski
Note that hashing in python 2.7 and prior to 3.4 is simply broken and
the randomization does not do nearly enough, see
https://bugs.python.org/issue14621

On Wed, Feb 17, 2016 at 4:45 AM, Shell Xu  wrote:
> I thought you are right. Here is the source code in python 2.7.11:
>
> long
> PyObject_Hash(PyObject *v)
> {
> PyTypeObject *tp = v->ob_type;
> if (tp->tp_hash != NULL)
> return (*tp->tp_hash)(v);
> /* To keep to the general practice that inheriting
>  * solely from object in C code should work without
>  * an explicit call to PyType_Ready, we implicitly call
>  * PyType_Ready here and then check the tp_hash slot again
>  */
> if (tp->tp_dict == NULL) {
> if (PyType_Ready(tp) < 0)
> return -1;
> if (tp->tp_hash != NULL)
> return (*tp->tp_hash)(v);
> }
> if (tp->tp_compare == NULL && RICHCOMPARE(tp) == NULL) {
> return _Py_HashPointer(v); /* Use address as hash value */
> }
> /* If there's a cmp but no hash defined, the object can't be hashed */
> return PyObject_HashNotImplemented(v);
> }
>
> If object has hash function, it will be used. If not, _Py_HashPointer will
> be used. Which _Py_HashSecret are not used.
> And I checked reference of _Py_HashSecret. Only bufferobject, unicodeobject
> and stringobject use _Py_HashSecret.
>
> On Wed, Feb 17, 2016 at 9:54 AM, Steven D'Aprano 
> wrote:
>>
>> On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote:
>> > On 2/16/2016 1:48 AM, Christoph Groth wrote:
>> > >Hello,
>> > >
>> > >Recent Python versions randomize the hashes of str, bytes and datetime
>> > >objects.  I suppose that the choice of these three types is the result
>> > >of a compromise.  Has this been discussed somewhere publicly?
>> >
>> > Search archives of this list... it was discussed at length.
>>
>> There's a lot of discussion on the mailing list. I think that this is
>> the very start of it, in Dec 2011:
>>
>> https://mail.python.org/pipermail/python-dev/2011-December/115116.html
>>
>> and continuing into 2012, for example:
>>
>> https://mail.python.org/pipermail/python-dev/2012-January/115577.html
>> https://mail.python.org/pipermail/python-dev/2012-January/115690.html
>>
>> and a LOT more, spread over many different threads and subject lines.
>>
>> You should also read the issue on the bug tracker:
>>
>> http://bugs.python.org/issue13703
>>
>>
>> My recollection is that it was decided that only strings and bytes need
>> to have their hashes randomized, because only strings and bytes can be
>> used directly from user-input without first having a conversion step
>> with likely input range validation. In addition, changing the hash for
>> ints would break too much code for too little benefit: unlike strings,
>> where hash collision attacks on web apps are proven and easy, hash
>> collision attacks based on ints are more difficult and rare.
>>
>> See also the comment here:
>>
>> http://bugs.python.org/issue13703#msg151847
>>
>>
>>
>> > >I'm not a web programmer, but don't web applications also use
>> > >dictionaries that are indexed by, say, tuples of integers?
>> >
>> > Sure, and that is the biggest part of the reason they were randomized.
>>
>> But they aren't, as far as I can see:
>>
>> [steve@ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
>> 1071302475
>> [steve@ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
>> 1071302475
>>
>> Web apps can use dicts indexed by anything that they like, but unless
>> there is an actual attack, what does it matter? Guido makes a good point
>> about security here:
>>
>> https://mail.python.org/pipermail/python-dev/2013-October/129181.html
>>
>>
>>
>> > I think hashes of all types have been randomized, not _just_ the list
>> > you mentioned.
>>
>> I'm pretty sure that's not actually the case. Using 3.6 from the repo
>> (admittedly not fully up to date though), I can see hash randomization
>> working for strings:
>>
>> [steve@ando 3.6]$ ./python -c "print(hash('abc'))"
>> 11601873
>> [steve@ando 3.6]$ ./python -c "print(hash('abc'))"
>> -2009889747
>>
>> but not for ints:
>>
>> [steve@ando 3.6]$ ./python -c "print(hash(42))"
>> 42
>> [steve@ando 3.6]$ ./python -c "print(hash(42))"
>> 42
>>
>>
>> which agrees with my recollection that only strings and bytes would be
>> randomized.
>>
>>
>>
>> --
>> Steve
>> ___
>> Python-Dev mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/shell909090%40gmail.com
>
>
>
>
> --
> 彼節者有間,而刀刃者無厚;以無厚入有間,恢恢乎其於游刃必有餘地矣。
> blog: http://shell909090.org/blog/
> twitter: @shell909090
> about.me: http://about.me/shell909090
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/fijall%40gma