Paul McMillan <p...@mcmillan.ws> added the comment:

A couple of things here:

First, my proposed change is not cryptographically secure. There simply aren't 
any cryptographic hashing algorithms available that are in the performance 
class we need. My proposal does make the collision attack quite difficult to 
carry out, even if the raw output values from the hash are available to the 
attacker (they should not usually be).

I favor using the same algorithm between 32 and 64 bit builds for consistency 
of behavior. Developers would be startled to find that ordering stays 
consistent on a 64 bit build but varies on 32 bit builds. Additionally, the 
impracticality of attacking of 64 bit builds rests on the fact that these 
particular researchers didn't devise a way to do it. I'd hate to make this 
change and then have a clever mathematician publish some elegant point 
requiring us to go fix the problem all over again. 

I could be convinced either way on small strings. I like that they can't be 
used to attack the secret. At the same time, I worry that combining 2 different 
hashing routines into the same output space may introduce unexpected collisions 
and other difficult to debug edge-case conditions. It also means that the order 
of the hashes of long strings will vary while the order of short strings will 
not - another inconsistency which will encourage bugs.

Thank you Victor for the improvements to the python demonstration code. As 
Antoine said, it's only demo code, but better demo code is always good.

Antoine: That section of the manpage is referring to the overall security of a 
key generated using urandom. 256 bits is overkill for this application. We 
could take 256 bits and use them to generate a key using a cryptographically 
appropriate algorithm, but it's simpler to read more bits and use them directly 
as the key.

Additionally, that verbiage has been in the man page for urandom for quite some 
time (probably since the earliest version in the mid 90's). The PRNG has been 
improved since then.

Minimum length of r is a hard question. The shorter it is, the higher the 
correlation of the output. In my tests, 16kb was the amount necessary to 
generally do reasonably well on my test suite for randomness even with 
problematic input. Obviously our existing output isn't random, so it doesn't 
pass those tests at all. Using a fairly small value (4k) should not make the 
results much worse from a security perspective, but might be problematic from a 
collision/distribution standpoint. It's clear that we don't need 
cryptographically good randomness here, but passing the test suite is not a bad 
thing when considering the distribution.

When we settle on a C implementation, I'd like to run it through the smhasher 
set of tests to make sure we aren't making distribution worse, especially for 
very small values of r.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to