Marc-Andre Lemburg <m...@egenix.com> added the comment:
On 07.10.2021 15:29, Christian Heimes wrote: > > Christian Heimes <li...@cheimes.de> added the comment: > > JP got back to me > > On 07/10/2021 14.34, Jean-Philippe Aumasson wrote: >> xxHash is much faster indeed, but collisions seem trivial to find, which >> might allow hash-flood DoS again (see for example >> https://github.com/Cyan4973/xxHash/issues/180 >> <https://github.com/Cyan4973/xxHash/issues/180>). It's however unclear >> whether exploitable multicollisions can also be trivially found. >> >> If collisions don't matter and if the ~10x speed-up makes a difference, >> then probably a good option, but guess you'll need to keep SipHash (or >> some other safe hash) when DoS resistance is needed? > > This information disqualifies xxHash for our use case. The quoted issue was for an early version of XXH3. Please see https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison as reference for collision analysis of the current xxHash versions. The numbers are close to the expected case, meaning that collisions are not more frequent than statistically to be expected given the hash and the test sample size. Looking at this older comparison, it would also make sense to revisit the hybrid approach, e.g. use FNV for strings up to 16 bytes, XXH128 for longer strings: https://cglab.ca/~abeinges/blah/hash-rs/ Given that dictionaries often use relatively short string keys, this should have a significant effect on applications where the dictionaries don't just use interned strings. It would also have an effect on Python startup time, since all those interned strings need to have their hash calculated during startup. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue29410> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com