On Sun, 22 Mar 2009 23:10:21 -0400, Terry Reedy wrote: > Searching for a key in, say, 10 dicts will be slower than searching for > it in just one. The only reason I would do this would be if the dict > had to be split, say over several machines. But then, you could query > them in parallel.
That surely depends on the number of collisions? With millions of keys, the number of collisions could be high (although it almost certainly won't be). As I understand it, searching a dict is O(1) on average, but if there are collisions it can degrade to O(m) (where m is the number of items colliding for some hash). In principle, m could be large. Hypothetically speaking, if you could predict where collisions occur and split your data into multiple smaller dicts with fewer collisions, you could delegate to one of the smaller dicts and reduce O(m) lookups to 2*O(1) lookups, which of course is just O(1). Of course, this entirely assumes that collisions in Python's hash function are both predictable and frequent in the OP's data. If they're not both, then it's unlikely that this suggestion will be of practical use. Could be an interesting experiment though :) -- Steven -- http://mail.python.org/mailman/listinfo/python-list