[computer-go] MC - Estimating a moves true probability of winning

Jason House Wed, 28 Feb 2007 08:06:14 -0800

Based on my analysis, estimating a moves probability of winning bytaking the number of winning simulations (w) and dividing it by thetotal number of simulations (n) is actually biased. I tried to breakthis e-mail up into sections for easy digestion by the various peoplewho might read this. The sections are: overview, scope, assumptions,and derivation. I hope the first two are enough to given a flavor ofwhat the e-mail is about. The 3rd (assumptions) I expect to spark somedebate. The derivation will probably only be skimmed by most.


OVERVIEW

The generalized probability of winning is (w+alpha)/(n+alpha+beta).Under the simplest assumption alpha = beta = 1, and the result becomes(w+1)/(n+2). Notice how with no simulations, there's a 50/50 chance ofa move being good or bad and how one (good or bad) simulation doesn'tforce the estimation of winning to one extreme.


SCOPE

These results are only relevant when doing non-uniform sampling (suchas UCT) and where the sample size is relatively small. Under uniformsampling, the move with the most wins is still best. With non-uniformsampling, a large n (and likely w), the extra factors of alpha and betaget washed out and these results become less relevant.


ASSUMPTIONS

To use these results, you must make some assumption about theunderlying distribution of a move's probability of winning. If nothingis known a priori, then assuming no probability is any more likely thanany other seems safe (resulting in uniform distribution which is also abeta distribution with alpha=beta=1). For mature MC bots, I assume thisis just curve fitting to historically collected data to a betadistribution (yielding an alpha and beta). Using the example picturesat http://en.wikipedia.org/wiki/Beta_distribution and what I rememberfrom this mailing list, alpha=2 and beta=5 seems to peak in about theright area ( (alpha-1)/(alpha+beta-2) = 20% ). I doubt that's thecorrect distribution, but will have to default to the MC experts on thislist.


DERIVATION

The underlying approach is to calculate the probability that the givensample came from a move with a particular probability of winning (mu)and then do the integral of mu*p(mu | w,n) for all mu (0% to 100%).(where p(mu | w,n) = probability of mu given the observed w and n).p(mu | w,n) = p(w | mu,n) * p(mu) / p(w | n). Thankfully, p(w | n) canbe written off as a normalization constant: p(w | n) = integral of p(w |mu,n)*p(mu) for all mu.p(w | mu,n) can be found in any textbook as (mu^w)*(1-mu)^(n-w). Evenwhen assuming p(mu) = uniform, calculation of the normalization constantlooked daunting. Cheating with an online integrator lead me to the betadistribution.Notice that the integral of mu*p(mu | w,n) boils down to finding themean of p(mu | w,n). The solution for p(mu)=constant becomes finding thmean of a beta distribution with alpha=w+1 and beta = (w-n+1) and yieldsthe simplest solution given above.Assuming a distribution for p(mu) could lead to some very messycalculus. The best generalized solution I can come up with is when youassume p(mu) follows a beta distribution because then p(mu | w,n)*p(mu)is still a beta distribution. This yields what I put at the top of thise-mail.If one limits p(mu) to a polynomial, integrals with each term in thepolynomial become small beta distributions. For integer powers, theresult is a ratio of products of factorials. The mean cancels outnicely, but once p(mu) is the sum of multiple beta distributions,convenient cancellations are not guaranteed. I did not try to work outany examples since I don't yet know that curve fitting to a betadistribution is unreasonable.Maybe other simple solutions exist, but I'll leave that up to thosewho are even bigger math geeks than I am ;)


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] MC - Estimating a moves true probability of winning

Reply via email to