New submission from Antoine Pitrou <pit...@free.fr>:

This is a patch experiment which does two things:
- make dicts denser by making the resize factor 2 instead of 4 for small dicts
- improve cache locality on collisions by using linear probing

It should be noted that these two changes are not independent. Improving cache 
locality on collisions makes probing cheaper, and in turn should allow to make 
small dicts denser.

Linear probing is motivated by the fact that collisions can happen frequently.  
The comments in dictobject.c seem a bit mistaken:

“If we *usually* find the key we're looking for on the first try (and, it turns 
out, we usually do -- the table load factor is kept under 2/3, so the odds are 
solidly in our favor), then it makes best sense to keep the initial index 
computation dirt cheap.”

According to http://www.cse.ust.hk/~yike/pods10-hashing.pdf, however, things 
are not so rosy. The average number of probes for successful lookups, depending 
on the load factor "alpha", is given by:

>>> c = lambda alpha: 0.5 * (1 + 1/(1-alpha))

while the average number of probes for unsuccessful lookups is:

>>> cp = lambda alpha: 0.5 * (1 + 1/(1-alpha)**2)

(note: this is with a perfectly random hash function; I intuitively assume an 
imperfect random function gives higher figures)

For a load factor of 2/3, we then get:

>>> c(2/3)
1.9999999999999998
>>> cp(2/3)
4.999999999999999

Since the current non-linear probing schemes guarantees that each probing will 
access a different cache line, the cache locality of a lookup becomes very poor.

The problem with linear probing, as noted in the comments, is that it degrades 
performance quite a lot when the hashing function clusters results. The 
solution I'm proposing is to apply an *initial* perturbation, by multiplying 
the hash() with a prime number. Multiplication is very fast on modern CPUs, so 
this doesn't adversely affect performance.

----------
components: Interpreter Core
files: dictopts.patch
keywords: patch
messages: 121139
nosy: mark.dickinson, pitrou, rhettinger, tim_one
priority: normal
severity: normal
status: open
title: Denser dicts and linear probing
type: performance
versions: Python 3.2
Added file: http://bugs.python.org/file19595/dictopts.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10408>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to