On Jul 12, 10:13 am, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > On Jul 11, 3:01 pm, [EMAIL PROTECTED] wrote: > > > I have found this perfect hash (minimal too) > > implementation:http://burtleburtle.net/bob/hash/perfect.html > > > I have already translated part of it to D, and it seems to work well > > enough. As discussed in the PyConDue, I think this may be used in > > frozenset (and frozendict) to build a (minimal too?) perfect hash on > > the fly, to allow (hopefully) faster retrieval of items that don't > > change.
I played around with the idea and have a rough implementation sketch: class PerfectSet(collections.Set): __slots__ = ['len', 'hashvalue', 'hashfunc', 'hashtable'] DUMMY = object() def __init__(self, keys, sparseness=0.5): keys = frozenset(keys) self.len = len(keys) self.hashvalue = hash(keys) n = (len(keys) / sparseness) + 1 assert n > self.len self.hashfunc = make_perfect_hash_function(keys, n) self.hashtable = [self.DUMMY] * n for k in keys: self.hashtable[self.hashfunc(k)] = k del __len__(self, k): return self.len def __iter__(self, k) return (k for k in self.hashtable if k is not self.DUMMY) def __contains__(self, k): return self.table[self.hashfunc(k)] is not self.DUMMY The perfect hash function will need to use the regular hash(obj) as a starting point. This helps with non-string types and it preserves existing relationships between __hash__ and __eq__. The construction is more expensive than with regular frozensets so it is only useful when lookups are very common. Playing around with the idea suggests that a sparseness variable for frozensets would largely accomplish the same goal without the overhead of creating and applying a perfect hash function. Sparse hash tables rarely collide. Raymond -- http://mail.python.org/mailman/listinfo/python-list