Knuth says to pick N distinct records from a collection where the probability is equal you should:
first fill up N records by chosing the first seen. if less than N were in the collection, quit. otherwise, t = (the number of items looked at) or N to start. while your not at the end of the collection: increment t generate a random number between 0 and t => K if K < N: add it to your collection at position K (replacing the previous item). Now, to incorporate probability is another question . . . Find the largest probability. Assume it has 100% probability. Calculate the probability of each item accordingly. Take the randomly generated number and multiply it by the probability. Take the random number minus the (random number times to probability). If it falls in range, then you replace like usual. You should find that the max probability will always appear in the zeroth position if it is not replaced by another value. Now, this does not guarantee that the highest probability value will show up first in the list, since that is the same as sorting by the probability. It is just a way of increasing the probability of making the value fall in the range as the probability varies. I am not guaranteeing this even works. I am seeing that there is some collision among the numbers, but it will work for the most part. -- http://mail.python.org/mailman/listinfo/python-list