Lukas Lueg added the comment:

I was investigating a callgrind dump of my code, showing how badly 
unicode_hash() was affecting my performance. Using google's cityhash  instead 
of the builtin algorithm to hash unicode objects improves overall performance 
by about 15 to 20 percent for my case - that is quite a thing.
Valgrind shows that the number of instructions spent by unicode_hash() drops 
from ~20% to ~11%. Amdahl crunches the two-fold performance increase to the 
mentioned 15 percent.

Cityhash was chosen because of it's MIT license and advertisement for 
performance on short strings.

I've now found this bug and attached a log for haypo's benchmark which compares 
native vs. cityhash. Caching was disabled during the test. Cityhash was 
compiled using -O3 -msse4.2 (cityhash uses cpu-native crc instructions). 
CPython's unittests fail due to known_hash and gdb output; besides that, 
everything else seems to work fine.

Cityhash is advertised for it's performance with short strings, which does not 
seem to show in the benchmark. However, longer strings perform *much* better.

If people are insterested, i can repeat the test on a armv7l

----------
nosy: +ebfe
Added file: http://bugs.python.org/file30446/cityhash.txt

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16427>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to