Carl Banks wrote: [...] > Random element from a set is such a natural idea. > > Even if we were to modify the set type at the C level to support it, I > can't think of an easy way to get a random element without selection > bias. For instance, if you start from a random hash code, some > elements are more likely to be selected than others, depending on > distance between entries in the hash table. You'd probably have to > choose an number in range(len(s)) and count off that many, but then > might as well have just converted it to a list or used an iterator. A > little disappointing, actually. > > Probably the "right" way is to use a binary-tree-based set.
There's a reasonably straightforward solution supporting O(1) time random choice that also preserves the O(1) average-case time of set's add(), remove(), and 'in'. The trick is to keep each element in both a list and a hash table. Implementing Python's entire set interface is a bit of project, so the code below just supports enough for a demo. -Bryan Olson # -------- from random import choice class SetWithRandom: def __init__(self, *args): self.lst = list(*args) self.dct = dict((elem, i) for (i, elem) in enumerate(self.lst)) def audit(self): # Test consistency of dict and list. assert len(self.lst) == len(self.dct) for elem in self.dct: assert self.lst[self.dct[elem]] == elem def random(self): return choice(self.lst) def add(self, elem): if elem not in self.dct: self.dct[elem] = len(self.lst) self.lst.append(elem) def remove(self, elem): index = self.dct[elem] # Move the last item into vacated index. mover = self.lst[-1] self.lst[index] = mover self.dct[mover] = index del self.dct[elem] self.lst.pop() def __len__(self): return len(self.lst) def __contains__(self, elem): return elem in self.dct def __iter__(self): return self.dct.__iter__() # --------- # Test against Python's built-in set. for i in range(0, 100, 20): trange = list(range(1000, 1000 + i)) pyset = set(trange) testset = SetWithRandom(pyset) testset.audit() assert pyset == set(testset) rands = set(testset.random() for _ in trange) assert len(rands) >= i / 4 while pyset: randx = testset.random() pyset.remove(randx) testset.remove(randx) testset.audit() for _ in range(choice([0, 0, 1, 2])): randx = choice(trange) pyset.add(randx) testset.add(randx) assert pyset == set(testset) assert not testset -- http://mail.python.org/mailman/listinfo/python-list