On Fri, Sep 26, 2008 at 9:29 AM, Jason House <[EMAIL PROTECTED]> wrote: > > > Sent from my iPhone > > On Sep 24, 2008, at 5:16 PM, Jason House <[EMAIL PROTECTED]> > wrote: > >> On Sep 24, 2008, at 2:40 PM, Jacques Basaldúa <[EMAIL PROTECTED]> wrote: >>> >> >>> Therefore, the variance of the normal that best approximates the >>> distribution of both RAVE and >>> wins/(wins + losses) is the same n·p·(1-p) >> >> See above, it's slightly different. >> >> >>> If this is true, the variance you are measuring from the samples does not >>> contain any information >>> about the precision of the estimators. If someone understands this >>> better, please explain it to >>> the list. >> >> This will get covered in my next revision. A proper discussion is too much >> to type with my thumb... > > My paper writing time is less than I hadhiped, so here's a quick and dirty > answer. > > For a fixed win rate, the probabilities of a specific number of wins and > losses follows the binomial distribution. That distribution keeps p > (probability of winning) constant and the number of observed wins and losses > variable. > > When trying to reverse this process, the wins and losses are kept constant > and p varies. Essentially prob(p=x) is proportional to (x^wins)(1-x)^losses. > > This is a Beta distribution with known mean, mode, variance, etc... It's > these values which should be used for approximating the win rate estimator > as a normal distion. Does that makes sense? > > I've glossed over a very important detail when "reversing". Bayes Theorem > requires some extra a priori information. My preferred handling alters the > reversed equation's exponents a bit but the basic conclusion (of a beta > distribution) is the same.
Maybe I can say it a little more precisely. Before we have collected any data, let's use a uniform prior for p. After we sample the move a number of times, we obtain w wins and l losses. Bayes's theorem tells us that the posterior probability distribution is a beta distribution B(w+1,l+1) (see http://en.wikipedia.org/wiki/Beta_distribution for details). An implication of this is that the expected value of p after w wins and l losses is (w+1)/(w+l+2). This is the same as initializing w=l=1 before you have any information and then using w/(w+l) as your winning rate, which some people have done intuitively, but it's clear that it's not just a kludge. I'll use the letter r for the value of the winning rate. The estimate of the variance is (w+1)*(l+1)/((w+l+2)^2*(w+l+3)), which is r*(1-r)/(w+l+3). The simple UCB formula uses an estimate of the variance that is simply 1/visits, so perhaps one should modify the formula by multiplying that estimate by r*(1-r), which means that the variance is smaller in positions that look like clear victories for one side. I don't know if this makes any difference in practice, but I doubt it. Álvaro. _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/