[issue14621] Hash function is not randomized properly

2016-04-22 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- Removed message: http://bugs.python.org/msg263978 ___ Python tracker ___ ___ Python-bugs-list mailing

[issue14621] Hash function is not randomized properly

2016-04-22 Thread SilentGhost
Changes by SilentGhost : -- nosy: +Arfrever, Bob.Ziuchkovski, Giovanni.Bajo, PaulMcMillan, ReneSac, Vlado.Boza, alex, arigo, benjamin.peterson, bkabrda, camara, christian.heimes, cvrebert, dmalcolm, dstufft, gregory.p.smith, haypo, iElectric, isoschiz, konk, lemburg, mark.dickinso

[issue14621] Hash function is not randomized properly

2016-04-22 Thread lissacoffey
lissacoffey added the comment: How about Python grows a additional btree implementation in its collections module? I know that it's not going to fix existing code. However in the long run it's the best safeguard against hash collision attacks. I'm thinking about a simple, self balancing btree

[issue14621] Hash function is not randomized properly

2013-12-09 Thread Benjamin Peterson
Benjamin Peterson added the comment: I think that's just WONTFIX at this point. -- resolution: -> fixed status: open -> closed ___ Python tracker ___ ___

[issue14621] Hash function is not randomized properly

2013-12-09 Thread Nick Coghlan
Nick Coghlan added the comment: This issue has belatedly had a CVE assigned: CVE-2013-7040 ("CPython hash secret can be recovered remotely") 3.4 is not affected (due to PEP 456), but 3.3 and 2.7 are still affected. -- nosy: +bkabrda status: pending -> open versions: +Python 2.7, Python

[issue14621] Hash function is not randomized properly

2013-11-20 Thread Christian Heimes
Christian Heimes added the comment: The issue has been solved for Python 3.4 with the integration of PEP 456. -- stage: -> committed/rejected status: open -> pending ___ Python tracker

[issue14621] Hash function is not randomized properly

2013-06-01 Thread Donald Stufft
Changes by Donald Stufft : -- nosy: +dstufft ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python

[issue14621] Hash function is not randomized properly

2013-04-20 Thread Martin Morrison
Changes by Martin Morrison : -- nosy: +isoschiz, pconnell ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: See microbenchmark results in issue16427. Really hashing of ASCII/UCS1 strings more than 2x slower than bytes hashing. -- ___ Python tracker

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 19:51, Christian Heimes ha scritto: > > Christian Heimes added the comment: > > Giovanni, why do you think that hashing of unicode strings is slower than > byte strings? > > First of all ASCII only unicode strings are packed

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Christian Heimes
Christian Heimes added the comment: Giovanni, why do you think that hashing of unicode strings is slower than byte strings? First of all ASCII only unicode strings are packed and use just one byte per char. CPython's FNV implementation processes one element in each cycle, that is one byte fo

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Benjamin Peterson
Benjamin Peterson added the comment: 2013/1/2 Giovanni Bajo : > > Giovanni Bajo added the comment: > > Il giorno 02/gen/2013, alle ore 06:52, Christian Heimes > ha scritto: > >> >> Christian Heimes added the comment: >> >> Thanks for the information! I'm working on a PEP for the issue at hand.

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 06:52, Christian Heimes ha scritto: > > Christian Heimes added the comment: > > Thanks for the information! I'm working on a PEP for the issue at hand. Since you're collecting ideas on this, I would like to stress that, in the

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 00:20, Domen Kožar ha scritto: > > Domen Kožar added the comment: > > According to talk at 29c3: > http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html > > Quote: We also describe a vulnerability of Python's new ra

[issue14621] Hash function is not randomized properly

2013-01-01 Thread Nick Coghlan
Nick Coghlan added the comment: Bob, the hash invariant isn't a mere implementation detail, it is critical to making hash based data structures work properly - if two equal objects (say the integer zero and the float zero) ever end up in different hash bins, then the uniqueness property of dic

[issue14621] Hash function is not randomized properly

2013-01-01 Thread Christian Heimes
Christian Heimes added the comment: Thanks for the information! I'm working on a PEP for the issue at hand. -- assignee: -> christian.heimes ___ Python tracker ___ _

[issue14621] Hash function is not randomized properly

2013-01-01 Thread Domen Kožar
Domen Kožar added the comment: According to talk at 29c3: http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html Quote: We also describe a vulnerability of Python's new randomized hash, allowing an attacker to easily recover the 128-bit secret seed. As a reliable fix to hash-flooding

[issue14621] Hash function is not randomized properly

2012-12-02 Thread Bob Ziuchkovski
Bob Ziuchkovski added the comment: Why not redefine -R to mean "use secure hashing algorithms for built-in types"? When specified, use hashing algorithms that are secure against denial-of-service and other known attacks, at the possible expense of performance. When not specified, use whatever

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 30.11.2012 22:27, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > >> try: >> mapping = {} >> mapping.max_collisions = 100 >> mapping.update(source) >> except CollisionLimitError: >> return 'no thank you' > > May be use

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > try: > mapping = {} > mapping.max_collisions = 100 > mapping.update(source) > except CollisionLimitError: > return 'no thank you' May be use a more general solution? try: with run_with_timeout(timeout=100, timer=collisions_count):

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Serhiy Storchaka: Yes, but it is still O(log n) worst case. Even in the > worst case rebalancing, you only need to walk up/down rotating/spliting > every node in your path. As the tree height is guaranteed to be x * log n > (x from 1 to 2, depending on the al

[issue14621] Hash function is not randomized properly

2012-11-30 Thread René
René added the comment: Serhiy Storchaka: Yes, but it is still O(log n) worst case. Even in the worst case rebalancing, you only need to walk up/down rotating/spliting every node in your path. As the tree height is guaranteed to be x * log n (x from 1 to 2, depending on the algorithm), the reb

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 30.11.2012 21:06, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > René, a balanced tree requires rebalancing on every (or almost every) item > for some special (sorted) data sequences. Sure, but that's still O(N*log N) for an attack

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Michal Petrucha
Michal Petrucha added the comment: On Fri, Nov 30, 2012 at 08:06:32PM +, Serhiy Storchaka wrote: > René, a balanced tree requires rebalancing on every (or almost > every) item for some special (sorted) data sequences. That's perfectly true and it holds for most unsorted sequences as well --

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: René, a balanced tree requires rebalancing on every (or almost every) item for some special (sorted) data sequences. -- ___ Python tracker __

[issue14621] Hash function is not randomized properly

2012-11-30 Thread René
René added the comment: Serhiy Storchaka: I said a O(log n) data structure, so I was referring to balanced trees, like AVL, RBT or B+-tree. They are not vulnerable to sorted data. The downside is that they need the keys to provide robust comparison methods (like, if z < y < x, then z < x). C

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: However a tree data structure is vulnerable to sorted data. May be worth it to have the two hash functions (switchable by interpreter option or environment variable), strong for applications which can be attacked, and fast for applications which run in safe

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Christian Heimes
Christian Heimes added the comment: No, Murmur3 *is* busted. Some clever people have found a way to perform a universal multicollision attack, that's a key independent attack. An attacker doesn't need to know the seed for an attack. Collision counting as not a solution for the issue, just a wo

[issue14621] Hash function is not randomized properly

2012-11-30 Thread René
René added the comment: Christian Heimes: It has always been trivial to artificially generate collisions for fast hashes designed for hash tables, like MurmurHash. I wouldn't call Murmurhash3 "busted" because of that, as this was never a design goal. It is a known propriety of this type of has

[issue14621] Hash function is not randomized properly

2012-11-29 Thread Łukasz Rekucki
Łukasz Rekucki added the comment: Note that a few weeks ago, Ruby has switched from Murmur to SipHash for this exact reason: http://www.ruby-lang.org/en/news/2012/11/09/ruby19-hashdos-cve-2012-5371/ -- nosy: +Łukasz.Rekucki ___ Python tracker

[issue14621] Hash function is not randomized properly

2012-11-11 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 11/nov/2012, alle ore 05:56, Chris Rebert ha scritto: > > Chris Rebert added the comment: > > What about CityHash? (http://code.google.com/p/cityhash/ ; unofficial C port: > http://code.google.com/p/cityhash-c/ ) > It's good enough for Google... I

[issue14621] Hash function is not randomized properly

2012-11-11 Thread STINNER Victor
STINNER Victor added the comment: Did qomeone start to write a PEP? Le 11 nov. 2012 05:56, "Chris Rebert" a écrit : > > Chris Rebert added the comment: > > What about CityHash? (http://code.google.com/p/cityhash/ ; unofficial C > port: http://code.google.com/p/cityhash-c/ ) > It's good enough f

[issue14621] Hash function is not randomized properly

2012-11-10 Thread Chris Rebert
Chris Rebert added the comment: What about CityHash? (http://code.google.com/p/cityhash/ ; unofficial C port: http://code.google.com/p/cityhash-c/ ) It's good enough for Google... -- nosy: +cvrebert ___ Python tracker

[issue14621] Hash function is not randomized properly

2012-11-10 Thread Gregory P. Smith
Gregory P. Smith added the comment: People have been posting micro-benchmarks (often run wrong) rather than actual useful benchmarks. Running our real world benchmarks would be more interesting. -- nosy: +gregory.p.smith ___ Python tracker

[issue14621] Hash function is not randomized properly

2012-11-08 Thread Christian Heimes
Christian Heimes added the comment: >From the header of murmurcollisions.cc: * multicollisions for MurmurHash3 * * MurmurHash3 C++ implementation is available at * http://code.google.com/p/smhasher/wiki/MurmurHash3 * * the function Murmur3Multicollisions finds many different inputs * has

[issue14621] Hash function is not randomized properly

2012-11-08 Thread Christian Heimes
Christian Heimes added the comment: I considered MurMur a year ago, too. Nowadays I don't think it's an option anymore. JPA and DJB have released a C++ program that is able to generate lots of collisions: https://www.131002.net/siphash/ C++ program to find universal (key-independent) multicoll

[issue14621] Hash function is not randomized properly

2012-11-08 Thread Sasha B
Sasha B added the comment: Ruby uses the Murmur hash for some types (string & integer at least): http://murmurhash.googlepages.com/ src: http://stackoverflow.com/a/3270836/1332819 The Perl hash implementation: http://cpansearch.perl.org/src/NWCLARK/perl-5.8.8/hv.c PHP5 hash implementation: ht

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 07/nov/2012, alle ore 12:59, Marc-Andre Lemburg ha scritto: > > Marc-Andre Lemburg added the comment: > > On 07.11.2012 12:55, Mark Dickinson wrote: >> >> Mark Dickinson added the comment: >> >> [MAL] >>> I don't understand why we are only trying

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Mark Dickinson
Mark Dickinson added the comment: [Armin] > You can build a sequence of (long) integers that have all exactly the > same hash, but doing that is not as easy as "2**k". Sure it is. The hash for integers is (by design) repeated modulo a number of the form 2**n - 1: we use 2**61 - 1 on 64-bit sy

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Armin Rigo
Armin Rigo added the comment: Wrong, sorry. On a 32-bit Python 2.7, "(2**32-1)*n" has the same hash -2, for any value of n. Of course if you build a dict containing thousands of such integers as keys, then right now you get unexpectedly bad performance. I wonder if I should open another bug

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Serhiy, the performance of hash() for long strings isn't very relevant for > the general performance of a Python program. It exposes the raw speed of hashing algorithm. It is good as a first estimate, because more real cases require more sophisticated mea

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Armin Rigo
Armin Rigo added the comment: I won't try to influence the outcome of this discussion, but I'd like to correct myself: in the measures I posted, "true randomness" is not needed at all. The exact criterion might be hard to pin down, but as a first approximation, we get the same answers as long

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: See issue16427. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.11.2012 13:06, Mark Dickinson wrote: > > Mark Dickinson added the comment: > > And I'm probably repeating myself too, but: the predictability of (and > difficulty of changing of) hashing for numeric types is why I'm strongly > opposed to hash collis

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Mark Dickinson
Mark Dickinson added the comment: And I'm probably repeating myself too, but: the predictability of (and difficulty of changing of) hashing for numeric types is why I'm strongly opposed to hash collision / slot collision limits: they'd end up disallowing reasonably natural looking Python nume

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.11.2012 12:55, Mark Dickinson wrote: > > Mark Dickinson added the comment: > > [MAL] >> I don't understand why we are only trying to fix the string problem >> and completely ignore other key types. > > [Armin] >> estimating the risks of giving up on

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Mark Dickinson
Mark Dickinson added the comment: [MAL] > I don't understand why we are only trying to fix the string problem > and completely ignore other key types. [Armin] > estimating the risks of giving up on a valid query for a truly random > hash, at an overestimated one billion queries per second ... T

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Christian Heimes
Christian Heimes added the comment: Serhiy, the performance of hash() for long strings isn't very relevant for the general performance of a Python program. Short strings dominate. I've modified the timeit to create a new string object every time. for I in 5 10 15 20 30 40 50 60; do echo -ne "$

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > can you please attach the generated assembly code for the siphash function > with your compiler and your optimization flags (that is, the one that > produces the above results)? GCC (Ubuntu 4.4.3-4ubuntu5.1) options: -pthread -c -Wno-unused-result -DNDEBU

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 07/nov/2012, alle ore 08:40, Serhiy Storchaka ha scritto: > Serhiy Storchaka added the comment: > > I tested different kind of strings. > > $ ./python -m timeit -n 1 -s "t = b'a' * 10**8" "hash(t)" > $ ./python -m timeit -n 1 -s "t = 'a' * 10**8"

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.11.2012 12:06, Armin Rigo wrote: > > Armin Rigo added the comment: > > Marc-André: estimating the risks of giving up on a valid query for a truly > random hash, at an overestimated one billion queries per second, in a 2/3 > full dictionary: > > * f

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Armin Rigo
Armin Rigo added the comment: Marc-André: estimating the risks of giving up on a valid query for a truly random hash, at an overestimated one billion queries per second, in a 2/3 full dictionary: * for 1000: 4E159 years between mistakes * for 100: 12.9 years between mistakes * for 150: 8E9 y

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Giovanni Bajo
Giovanni Bajo added the comment: Until it's broken with a yet-unknown attack, SipHash is a pseudo-random function and as such it does uniformly distribute values across the output space, and never leak any information on the key (the randomized seed). Being designed by cryptographers, it is li

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.11.2012 09:34, Marc-Andre Lemburg wrote: > > Here's a demo patch (against Python 2.7) which counts hash value collisions > and slot collisions. I had posted that in the original ticket where we > discussed the hash problem (http://bugs.python.org/iss

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Here's a demo patch (against Python 2.7) which counts hash value collisions and slot collisions. I had posted that in the original ticket where we discussed the hash problem (http://bugs.python.org/issue14621). This avoids issues like attack 1 mentioned in

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I tested different kind of strings. $ ./python -m timeit -n 1 -s "t = b'a' * 10**8" "hash(t)" $ ./python -m timeit -n 1 -s "t = 'a' * 10**8" "hash(t)" $ ./python -m timeit -n 1 -s "t = '\u0100' * 10**8" "hash(t)" $ ./python -m timeit -n 1 -s "t = '\U0001000

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: % ./python -m timeit -r20 -n100 -s "h = hash; x = 'a' * 10**7" -- "h(x)" Here is only one hash calculation and 99 cached calls. -- ___ Python tracker ___

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: Serhiy's trick #define U8TO64_LE(p) ((u64)((u32 *)(p))[0] | ((u64)((u32 *)(p))[1] << 32)) gives a nice speedup. Now UCS2 is down to 0.133 usec (12% slower than the current algorithm) and ASCII down to 0.105 usec (3% faster). --

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: Thanks to Snakebit I was able to tests the code on a 32bit BSD installation with GCC 4.2. The ASCII unicode and bytes performance is about 8% slower, UCS2 unicode is about 37% slower. There might be room for improvements, though. % ./python -m timeit -r20 -n

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: $ ./python -m timeit -s "x = b'a' * int(1E7)" "hash(x)" Note that hash calculated only once. Add -n 1 option and use a larger data. > If somebody likes to play with the algorithm: $ ./python -m timeit -n 1 -s "t = 'abcdefgh' * 10**8" "hash(t)" 1 loops, bes

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: We can explore the various optimization options later. Also unaligned memory address is not allowed on some architectures like SPARC. If somebody likes to play with the algorithm: http://hg.python.org/sandbox/cheimes/shortlog/2cb7e97ca8d0 -- hgrepos:

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Giovanni Bajo
Giovanni Bajo added the comment: For short strings, you might want to have a look at the way you fetch the final partial word from memory. If the string is >= 8 bytes, you can fetch the last partial word as an unaligned memory fetch followed by a shift, instead of using a switch like in the r

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: I modified crypto_auth() a bit: Py_uhash_t crypto_auth(const unsigned char *in, unsigned long long inlen) ... u64 k0 = _Py_HashSecret.prefix; u64 k1 = _Py_HashSecret.suffix; ... return (Py_uhash_t)b; and replaced the loop in _Py_HashBytes() with a c

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: Thanks! SipHash looks interesting. It's using a XOR + ROT approach with a seed. And it's written by DJB which is usually a good sign. He writes secure code with good quality. Just his coding style tends to be ... unique. :) I'm going to try the algorithm.

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Giovanni Bajo
Giovanni Bajo added the comment: Christian, there are good semi-crypto hash functions that don't leak as bad as Python's own modified FNV hash, without going all the way to HMAC. SipHash has very good collision resistance and doesn't leak anything: https://www.131002.net/siphash/ (notice: they

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: I deem hash randomization and collision counting as a poor man's workaround for the actual issue. Perhaps we shouldn't try too hard to fix an unsuitable data type. Hash maps have a known worst case complexity of O(n). A O(log n) algorithm should be used to p

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: Our hash randomization will always leak some information about the randomization keys. The only way to properly secure our secrets is a cryptographic secure algorithms, for example a crypto hashing function in combination with a message authentication code l

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Alex Gaynor
Changes by Alex Gaynor : -- nosy: +alex ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/

[issue14621] Hash function is not randomized properly

2012-11-06 Thread John M Camara
John M Camara added the comment: How about using a secure hash algorithm that's implemented in HW when available. It doesn't eliminate the issue on systems that lack this support but at least it limits the scope of the problem. Of course some testing would need to be done to make sure the har

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Benjamin Peterson
Benjamin Peterson added the comment: Here's the message that helped convince us to go against collision counting originally: http://mail.python.org/pipermail/python-dev/2012-January/115726.html -- ___ Python tracker

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Armin Rigo
Armin Rigo added the comment: Benjamin: oups, sorry. I don't remember setting the "easy" keyword, my mistake. Fwiw I'm +1 on Marc-Andre's solution. Make it a tunable setting, e.g. with sys.setcollisionlimit(). Defaults to sys.maxint on existing Pythons and some smaller value (70?) on new Py

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 21.10.2012 23:42, STINNER Victor wrote: > > STINNER Victor added the comment: > >> It's interesting to note how this whole -R discussion made very long > threads on python-dev, and python-dev has subsequently ignored (for the > past 6 months!) the fact t

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > $ ./python -m timeit -s "t = 'abcdefgh' * int(1E8)" "hash(t)" I got another numbers (32-bit Linux, AMD Athlon 64 X2 4600+). Python's current hash algorithm: 10 loops, best of 3: 343 msec per loop V8's algorithm: 10 loops, best of 3: 244 msec per loop

[issue14621] Hash function is not randomized properly

2012-10-21 Thread STINNER Victor
STINNER Victor added the comment: > It's interesting to note how this whole -R discussion made very long threads on python-dev, and python-dev has subsequently ignored (for the past 6 months!) the fact that their "fix" can be worked around in a matter of minutes. No, this issue has no been ignor

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Christian Heimes
Christian Heimes added the comment: As far as my understanding goes the issue can't be solved with our current hash algorithm. We'd have to use a crypto hash function or at least a hash algorithm that has an increased avalanche effect on the outcome. The current hash algorithm is designed and

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Benjamin Peterson
Benjamin Peterson added the comment: That doesn't make it any easy CPython issue. :) -- keywords: -easy ___ Python tracker ___ ___ Py

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Armin Rigo
Armin Rigo added the comment: For reference, the above means that we can implement -R support for PyPy as a dummy ignored flag, and get "security" that is very close to CPython's. :-) -- keywords: +easy ___ Python tracker

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Armin Rigo
Armin Rigo added the comment: Just to make it extra clear: Vlado showed that the "-R" switch of Python can easily be made fully pointless, with only a bit of extra work. Here is how: * Assume you have an algo that gives you as many strings with colliding hashes as you want, provided you know

[issue14621] Hash function is not randomized properly

2012-10-17 Thread Christian Heimes
Christian Heimes added the comment: I've modified unicodeobject's unicode_hash() function. V8's algorithm is about 55% slower for a 800 MB ASCII string on my box. Python's current hash algorithm for bytes and unicode: while (--len >= 0) x = (_PyHASH_MULTIPLIER * x) ^ (Py_uhash_t) *P

[issue14621] Hash function is not randomized properly

2012-10-11 Thread Christian Heimes
Changes by Christian Heimes : -- nosy: +christian.heimes ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:/

[issue14621] Hash function is not randomized properly

2012-04-26 Thread STINNER Victor
STINNER Victor added the comment: > Problem with current randomization of hash function > is following: Suffix does not influence whether two keys > have some hash or not (it is xor-ed after everything). Yes, the suffix is used to "protect" the secret. Without the suffix, it would be too simpl

[issue14621] Hash function is not randomized properly

2012-04-26 Thread Benjamin Peterson
Benjamin Peterson added the comment: We should see if more security can be gained without sacrificing speed. -- nosy: +benjamin.peterson ___ Python tracker ___ _

[issue14621] Hash function is not randomized properly

2012-04-26 Thread STINNER Victor
STINNER Victor added the comment: Oops, I attached the wrong "hash.py" file. -- Added file: http://bugs.python.org/file25375/hash.py ___ Python tracker ___ _

[issue14621] Hash function is not randomized properly

2012-04-26 Thread STINNER Victor
Changes by STINNER Victor : Removed file: http://bugs.python.org/file25282/hash.py ___ Python tracker ___ ___ Python-bugs-list mailing list Un

[issue14621] Hash function is not randomized properly

2012-04-26 Thread STINNER Victor
STINNER Victor added the comment: > One possible fix: ... > Main loop looks like this: .. Well, it was decided to not impact performances to workaround one specific class of attack, whereas there are many other ways to DoS Python. So we chose to keep the main loop unchanged. Randomizing the h

[issue14621] Hash function is not randomized properly

2012-04-20 Thread Vlado Boza
Vlado Boza added the comment: One possible fix: Look for StringHasher in google v8 code (http://code.google.com/p/v8/source/search?q=stringhasher&origq=stringhasher&btnG=Search+Trunk). Main loop looks like this: raw_running_hash_ += c;

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Arfrever Frehtes Taifersar Arahesis
Changes by Arfrever Frehtes Taifersar Arahesis : -- nosy: +Arfrever ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscri

[issue14621] Hash function is not randomized properly

2012-04-19 Thread STINNER Victor
STINNER Victor added the comment: hash.py: Python implementation of the 32-bit version of hash(bytes). Ok, I now see that only the lower 8 bits are really mixed with the input string. -- Added file: http://bugs.python.org/file25282/hash.py ___ Pytho

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Vlado Boza
Vlado Boza added the comment: >I tried this script on Linux 32 bits and Linux 64 bits: I didn't see any >>collision. What is your operating system and the version of your >operating >system please? uname -a Linux 3.0.0-17-generic #30-Ubuntu SMP Thu Mar 8 20:45:39 UTC 2012 x86_64 x86_64 x86_6

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Michal Petrucha
Michal Petrucha added the comment: @dmalcolm: As for the gdb example, you need to add --eval-command="set _Py_HashSecret_Initialized=1", otherwise _Py_HashSecret will get overwritten immediately after it is set by gdb, either to 0 if run without the -R switch, or to a random value. With the

[issue14621] Hash function is not randomized properly

2012-04-19 Thread STINNER Victor
STINNER Victor added the comment: > For example take this script (on 32bit): (...) > It gives collison too many times (around 9 out of 2500). I tried this script on Linux 32 bits and Linux 64 bits: I didn't see any collision. What is your operating system and the version of your operating sys

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Dave Malcolm
Dave Malcolm added the comment: $ gdb --eval-command="break _PyRandom_Init" --eval-command="run" --eval-command="print _Py_HashSecret" --eval-command="set _Py_HashSecret.prefix=0xcdcdcd00" --eval-command="print _Py_HashSecret" --eval-command="continue" -eval-command="continue" --args python -

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Vlado Boza
Vlado Boza added the comment: For example take this script (on 32bit): ha = hash("\x00\xcf\x0b\x96\x19") hb = hash("\x00\x6d\x29\x45\x18") if ha == hb: print "collision" And run following: for i in `seq 0 25`; do echo $i; for j in `seq 0 100`; do ./python -R x.py; done; done; It gives colli

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Vlado Boza
Vlado Boza added the comment: My bad (I checked only function in C++, not result in python). This should work on 32bit: Prefix: anything ending on 0x00 Suffix: anything Strings: "\x00\xcf\x0b\x96\x19", "\x00\x6d\x29\x45\x18" -- ___ Python tracker

[issue14621] Hash function is not randomized properly

2012-04-19 Thread STINNER Victor
STINNER Victor added the comment: I don't understand this issue: can you write a short script to test a collision? "E.g this strings collide for every prefix ending on 0xcd" Do you mean that prefix & 0xff == 0xcd? "0x27fd5a18, 0x26fe78fa" Is it a byte string or an Unicode string? b'\x27\xf

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +PaulMcMillan ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Dave Malcolm
Dave Malcolm added the comment: Thanks for filing this bug report. I'm not seeing the equal hashes you describe. I'm using this recipe to hardcode a specific prefix and print the hashes using it: $ gdb --eval-command="break _PyRandom_Init" --eval-command="run" --eval-command="print _Py_HashS

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Michal Petrucha
Changes by Michal Petrucha : -- nosy: +konk ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyt

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Dave Malcolm
Changes by Dave Malcolm : -- nosy: +dmalcolm ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python

  1   2   >