On 2013-01-11 14:15, Roy Smith wrote:
I have a list of items.  I need to generate n samples of k unique items
each.  I not only want each sample set to have no repeats, but I also
want to make sure the sets are disjoint (i.e. no item repeated between
sets).

random.sample(items, k) will satisfy the first constraint, but not the
second.  Should I just do random.sample(items, k*n), and then split the
resulting big list into n pieces?  Or is there some more efficient way?

Typical values:

len(items) = 5,000,000
n = 10
k = 100,000

I don't know how efficient it would be, but couldn't you shuffle the
list and then use slicing to get the samples?
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to