[issue10408] Denser dicts and linear probing

Charles-François Natali Sat, 10 Dec 2011 03:29:09 -0800

Charles-François Natali <neolo...@free.fr> added the comment:

This paper might be of interest:
http://www.siam.org/meetings/alenex05/papers/13gheileman.pdf


Basically, it concludes that most of the time, there's no speedup to be gained 
from the increased cached locality incurred by linear probing when the input 
sample differ from a uniform distribution. However, figure 1 shows a clear win 
with a uniform distribution, and figure 2 shows that even with a non-uniform 
distribution, there's still a gain as the load factor increases.
So if experiments show measurable improvements, this might be an interesting 
idea.
However, the problem with linear probing is that the performance will be much 
more dependent on the input distribution that with quadratic probing/double 
hashing, so this will definitely degrade with some workloads.

> The problem with linear probing, as noted in the comments, is that it 
> degrades performance quite a lot when the hashing function clusters 
> results. The solution I'm proposing is to apply an *initial* 
> perturbation, by multiplying the hash() with a prime number. 

I fail to see how this would avoid primary clustering: IIUC, you replace the 
hash function h(k) by h'(k), but you still use linear probing: an empty slot 
preceded by i filled slots has a probablility (i+1)/N of getting filled next (N 
being the total number of slots).

----------
nosy: +neologix

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10408>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10408] Denser dicts and linear probing

Reply via email to