On 2007-11-17, Bruza <[EMAIL PROTECTED]> wrote: > OOPS. I pressed the Send too fast. > > The problem w/ Boris's solution is that after repeated calling > of randomPick(3,items), 'Jane' is not the most "frequent > appearing" member in all the out list of 3 member lists...
How does this solution fair against your spec? def sample_bp(seq, probabilities, k): """ Return k distinct random items from seq, with probabilities specified as weighted integers in a list. >>> random.seed(0) >>> sample_bp(['a', 'b'], [1, 5], 2) ['b', 'a'] >>> sample_bp(['a', 'b', 'c'], [1, 5, 2], 3) ['b', 'a', 'c'] """ if k > len(seq): raise ValueError('sample size greater than population') probs = build_probs(probabilities) rv = [] while k > 0: j = random_prob(probs) rv.append(probs[j][2]) remove_prob(probs, j) k -= 1 return [seq[i] for i in rv] def build_probs(probabilities): """ Receives a list of integers, and returns list of ranges and original indices. >>> build_probs([8, 10, 7]) [(0, 8, 0), (8, 18, 1), (18, 25, 2)] >>> build_probs([1, 5, 8, 2, 3, 7]) [(0, 1, 0), (1, 6, 1), (6, 14, 2), (14, 16, 3), (16, 19, 4), (19, 26, 5)] """ k = 0 probs = [] for i, p in enumerate(probabilities): if p < 0: raise ValueError('negative probability') probs.append((k, k+p, i)) k = k+p return probs def random_prob(probs): """ Return the index of a weighted random element of prob. >>> random.seed(0) >>> for x in xrange(20): ... print random_prob(build_probs([1, 5, 8, 2, 3, 7])), 5 5 2 2 2 2 5 2 2 3 5 2 2 5 4 2 5 5 5 5 >>> random_prob([(0, 15, 0)]) 0 """ i = random.randrange(probs[-1][1]) # Binary search for the element whose range contains i hi = len(probs) lo = 0 while lo < hi: mid = (lo+hi)//2 begin, end, _ = probs[mid] if i >= begin and i < end: return mid elif i >= end: lo = mid+1 else: hi = mid def remove_prob(probs, i): """ Remove element j from the probability list, adjusting ranges as needed. >>> prob = [(0, 12, 0), (12, 15, 1), (15, 25, 2)] >>> remove_prob(prob, 1) >>> prob [(0, 12, 0), (12, 22, 2)] """ begin, end, _ = probs[i] diff = end - begin j = i+1 while j < len(probs): begin, end, index = probs[j] probs[j] = (begin-diff, end-diff, index) j += 1 del probs[i] This was the most simple-minded approach I could think of, so it might serve as a reference against a more efficient approach. Although thorough testing of such a bizarre function boggles my mind. I toyed with sorting the probability list from greatest to lowest, which would make a linear search fast, but unfortunately it would slow down removing probabilities. -- Neil Cerutti I make love to pressure. --Stephen Jackson -- http://mail.python.org/mailman/listinfo/python-list