random.sample with large weighted sample-sets?

Tim Chase Sat, 15 Feb 2014 20:44:14 -0800

I'm not coming up with the right keywords to find what I'm hunting.
I'd like to randomly sample a modestly compact list with weighted
distributions, so I might have


  data = (
    ("apple", 20),
    ("orange", 50),
    ("grape", 30),
    )

and I'd like to random.sample() it as if it was a 100-element list.
However, ideally, this could be done in O(size-of-data) storage
rather than requiring the build-out of the entire set just for
sampling purposes, as the actual data can get a bit large.  For this
small toy data-set, I can use

  sample_me = sum(([s]*n for s,n in data, [])
  random.sample(sample_me, k)

but for large counts, the list returned from sum() grinds my system
because I start swapping.  What am I missing? (links to relevant
keywords/searches/algorithms welcome in lieu of actually answering
in-line)

Thanks,

-tkc




 .
-- 
https://mail.python.org/mailman/listinfo/python-list

random.sample with large weighted sample-sets?

Reply via email to