New submission from Antoine Pitrou <pit...@free.fr>: This is a patch experiment which does two things: - make dicts denser by making the resize factor 2 instead of 4 for small dicts - improve cache locality on collisions by using linear probing
It should be noted that these two changes are not independent. Improving cache locality on collisions makes probing cheaper, and in turn should allow to make small dicts denser. Linear probing is motivated by the fact that collisions can happen frequently. The comments in dictobject.c seem a bit mistaken: “If we *usually* find the key we're looking for on the first try (and, it turns out, we usually do -- the table load factor is kept under 2/3, so the odds are solidly in our favor), then it makes best sense to keep the initial index computation dirt cheap.” According to http://www.cse.ust.hk/~yike/pods10-hashing.pdf, however, things are not so rosy. The average number of probes for successful lookups, depending on the load factor "alpha", is given by: >>> c = lambda alpha: 0.5 * (1 + 1/(1-alpha)) while the average number of probes for unsuccessful lookups is: >>> cp = lambda alpha: 0.5 * (1 + 1/(1-alpha)**2) (note: this is with a perfectly random hash function; I intuitively assume an imperfect random function gives higher figures) For a load factor of 2/3, we then get: >>> c(2/3) 1.9999999999999998 >>> cp(2/3) 4.999999999999999 Since the current non-linear probing schemes guarantees that each probing will access a different cache line, the cache locality of a lookup becomes very poor. The problem with linear probing, as noted in the comments, is that it degrades performance quite a lot when the hashing function clusters results. The solution I'm proposing is to apply an *initial* perturbation, by multiplying the hash() with a prime number. Multiplication is very fast on modern CPUs, so this doesn't adversely affect performance. ---------- components: Interpreter Core files: dictopts.patch keywords: patch messages: 121139 nosy: mark.dickinson, pitrou, rhettinger, tim_one priority: normal severity: normal status: open title: Denser dicts and linear probing type: performance versions: Python 3.2 Added file: http://bugs.python.org/file19595/dictopts.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10408> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com