On Nov 16, 4:47 pm, Bruza <[EMAIL PROTECTED]> wrote: > On Nov 16, 6:58 am, duncan smith <[EMAIL PROTECTED]> > wrote: > > > > > Bruza wrote: > > > I need to implement a "random selection" algorithm which takes a list > > > of [(obj, prob),...] as input. Each of the (obj, prob) represents how > > > likely an object, "obj", should be selected based on its probability > > > of > > > "prob".To simplify the problem, assuming "prob" are integers, and the > > > sum of all "prob" equals 100. For example, > > > > items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)] > > > > The algorithm will take a number "N", and a [(obj, prob),...] list as > > > inputs, and randomly pick "N" objects based on the probabilities of > > > the > > > objects in the list. > > > > For N=1 this is pretty simply; the following code is sufficient to do > > > the job. > > > > def foo(items): > > > index = random.randint(0, 99) > > > currentP = 0 > > > for (obj, p) in items: > > > currentP += w > > > if currentP > index: > > > return obj > > > > But how about the general case, for N > 1 and N < len(items)? Is there > > > some clever algorithm using Python standard "random" package to do > > > the trick? > > > I think you need to clarify what you want to do. The "probs" are > > clearly not probabilities. Are they counts of items? Are you then > > sampling without replacement? When you say N < len(items) do you mean N > > <= sum of the "probs"? > > > Duncabn > > I think I need to explain on the probability part: the "prob" is a > relative likelihood that the object will be included in the output > list. So, in my example input of > > items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)] > > So, for any size of N, 'Tom' (with prob of 45) will be more likely to > be included in the output list of N distinct member than 'Mary' (prob > of 30) and much more likely than that of 'John' (with prob of 10). > > I know "prob" is not exactly the "probability" in the context of > returning a multiple member list. But what I want is a way to "favor" > some member in a selection process. > > So far, only Boris's solution is closest (but not quite) to what I > need, which returns a list of N distinct object from the input > "items". However, I tried with input of > > items = [('Mary',1), ('John', 1), ('Tom', 1), ('Jane', 97)] > > and have a repeated calling of > > Ben
OOPS. I pressed the Send too fast. The problem w/ Boris's solution is that after repeated calling of randomPick(3,items), 'Jane' is not the most "frequent appearing" member in all the out list of 3 member lists... -- http://mail.python.org/mailman/listinfo/python-list