Sumitava Mukherjee wrote:
On May 27, 8:08 pm, Scott David Daniels <scott.dani...@acm.org> wrote:
Sumitava Mukherjee wrote:
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
I am not sure why everybody is normalizing by dividing the weights.
This isa possibility (I had fun writing it).
def sample_without_replacement(choices, weights):
... combined = [(w, v) for w, v in zip(weights, choices) if w > 0]
... total = sum(w for w, v in combined) # sum(weights) also works
while combined:
...
yield choice
total -= weight
if weight > total * 256: # arbitrary choice for recalculation
# precision affected, rebuild
total = sum(w for w, v in combined)
del combined[n]
raise ValueError('Samplng more than %d without replacement?' % (
sum(1 for w in weights if w > 0)))
...
Should be:
total -= weight
yield choice
del combined[n]
if weight > total * 256: # arbitrary choice for recalculation
# precision affected, rebuild
total = sum(w for w, v in combined)
raise ValueError('Samplng more than %d without replacement?' % (
sum(1 for w in weights)))
There were two mistakes, both related to the definition of combined:
(1) Combined only holds positive values, so testing in sum is silly.
(2) Total is meant to track the sum of weights in combined, and the
update was misplaced with respect to the delete.
Among all the help (all of which I really appreciate), I found your
solution closest to what I was expecting. Thanks a lot Scott.
You are welcome (presuming I didn't simply do your homework). I only
noticed the problems above while scanning your reply. The failures I
noticed should have been caught with good tests or a pair programming
partner. Of course, in this environment, I didn't have either high
quality tests or a partner (I did simple hand testing and eyeball
analysis).
--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list