Christian Heimes added the comment: Your benchmark is a bit unrealistic because it times the hash cache most of the time. Here is a better benchmark (but bytes-only):
$ ./python -m timeit -s "words=[w.encode('utf-8') for line in open('../LICENSE') for w in line.split()]; import collections" -- "c = collections.Counter(memoryview(w) for w in words); c.most_common(10)" 1000 loops, best of 3: 1.63 msec per loop This increases the number of hash calculations from about 28k to over 8.4 mio. I also added a little statistic function to see how large typical string are. The artificial benchmark: hash 1: 115185 hash 2: 1440956 hash 3: 1679976 hash 4: 873769 hash 5: 948124 hash 6: 651799 hash 7: 676707 hash 8: 545459 hash 9: 523615 hash 10: 421232 hash 11: 161641 hash 12: 140797 hash 13: 86826 hash 14: 41702 hash 15: 41570 hash 16: 332 hash 17: 211 hash 18: 4275 hash 19: 205 hash 20: 131 hash 21: 4197 hash 22: 70 hash 23: 35 hash 24: 44 hash 25: 4145 hash 26: 4137 hash 27: 4137 hash 28: 21 hash 29: 4124 hash 30: 8 hash 31: 5 hash 32: 1 hash other: 28866 hash total: 8404302 And here is the statistic of a full test run. hash 1: 18935 hash 2: 596761 hash 3: 643973 hash 4: 645399 hash 5: 576231 hash 6: 742531 hash 7: 497214 hash 8: 330890 hash 9: 291301 hash 10: 93206 hash 11: 1417900 hash 12: 160802 hash 13: 58675 hash 14: 49324 hash 15: 48068 hash 16: 90634 hash 17: 24163 hash 18: 66079 hash 19: 23408 hash 20: 20695 hash 21: 16424 hash 22: 17236 hash 23: 59135 hash 24: 10368 hash 25: 6047 hash 26: 6784 hash 27: 5565 hash 28: 5931 hash 29: 3469 hash 30: 4220 hash 31: 2652 hash 32: 2911 hash other: 72042 hash total: 6608973 About 50% of the hashed elements are between 1 and 5 characters long. A realistic hash collision attack on 32bit needs at least 7, 8 chars. I see the chance of a micro-optimization! :) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19183> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com