Marc-Andre Lemburg added the comment: On 01.02.2017 10:50, INADA Naoki wrote: > >> it seems as if it would make sense to not use a fixed >> hash algorithm for all strings lengths, but instead a >> hybrid one to increase performance for short strings >> (which are used a lot in Python). >> >> Is there a good hash algorithm with provides better >> performance for short strings than siphash ? > > There is undocumented option "Py_HASH_CUTOFF" to use DJBX33A for short string. > > See https://github.com/python/cpython/blob/master/Include/pyhash.h#L98
Thanks. I found a reference in the PEP 456: """ ... However a fast function like DJBX33A is not as secure as SipHash24. A cutoff at about 5 to 7 bytes should provide a decent safety margin and speed up at the same time. The PEP's reference implementation provides such a cutoff with Py_HASH_CUTOFF . The optimization is disabled by default for several reasons. For one the security implications are unclear yet and should be thoroughly studied before the optimization is enabled by default. Secondly the performance benefits vary. On 64 bit Linux system with Intel Core i7 multiple runs of Python's benchmark suite [pybench] show an average speedups between 3% and 5% for benchmarks such as django_v2, mako and etree with a cutoff of 7. Benchmarks with X86 binaries and Windows X86_64 builds on the same machine are a bit slower with small string optimization. The state of small string optimization will be assessed during the beta phase of Python 3.4. The feature will either be enabled with appropriate values or the code will be removed before beta 2 is released. """ The mentioned speedup sounds like enabling this by default would make a lot of sense (keeping the compile time option of setting Py_HASH_CUTOFF to 0). Aside: I haven't been following this since the days I proposed the collision counting method as solution to the vulnerability, which was rejected at the time. It's interesting how much effort went into trying to fix the string hashing in dicts over the years. Yet the much easier to use integer key hash collisions have still not been addressed. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29410> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com