Re: Monte Carlo probability calculation in Python

Paul Moore Tue, 10 Feb 2015 06:48:12 -0800

On Monday, 9 February 2015 21:57:51 UTC, Paul  Moore  wrote:
> On Friday, 6 February 2015 23:49:51 UTC, Steven D'Aprano  wrote:
> > Very nice! Care to share the code?
> 
> Will do.


Here's the code I used for the Monopoly calculations.

import numpy as np

def monopoly(samples):
    # 2d6 x 3
    num = 2
    sides = 6
    rolls = 3
    dice = np.random.randint(1, sides + 1, size=(samples, rolls, num))

    # Doubles are if the two dice are the same
    doubles = dice[...,0] == dice[...,1]
    # Reroll if this (and all previous rolls) are doubles
    reroll = np.logical_and.accumulate(doubles, axis=1)
    # Keep the first entry, and any valid rerolls
    keep = np.insert(reroll, 0, True, axis=1)
    # The last column (all 3 are doubles) is "go to gaol"
    keep, gaol = np.split(keep, [-1], axis=-1)
    gaol = gaol[...,0]

    # Add up all the rolls and zero out any we don't want to keep
    totals = dice.sum(axis=-1) * keep
    # Remove any "go to gaol" cases
    totals = totals[~gaol,...]
    # Add up the distance moved
    totals = totals.sum(axis=-1)

    gaol_count = np.sum(gaol)

    return totals, gaol_count

samples = 1000000
totals, gaol_count = monopoly(samples)

print("Average distance moved =", totals.mean())
print("Percentage of times you go to gaol: {:%}".format(gaol_count/samples))
for i, count in enumerate(np.bincount(totals)):
    print("{:>2} {}".format(i, count))

The actual calculations are fairly simple, once you get your head round the 
tricks you need to do everything array-based. The part I find the hardest by 
far, is keeping track of the dimensions of the arrays involved, and what I need 
to do to match up axes, etc. For example, it took me a long time to work out 
what was going wrong with stripping out the "go to gaol" cases. The problem 
turned out to be that I needed the line "gaol = gaol[...,0]" to strip off a 
dimension - but it was really hard working out what was going on, particularly 
as numpy broadcasts the values you have, so you don't get errors, just weirdly 
wrong answers. (This isn't unique to numpy, I remember having the same issues 
when I used to work with J).

One thing it'd be nice to do would be to abstract out the array dimension for 
the number of samples, so that the "application" code could be written as if 
working on just one sample, and the driver code extended that to multiple 
samples. But I don't have the first idea how I'd do that without ending up 
looping in Python, which is what I need to avoid if I want performance to 
remain good.

Lots of scope for more things to learn here, then :-)

Thanks again to everyone that helped.
Paul
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Monte Carlo probability calculation in Python

Reply via email to