Sent from my iPhone
On Sep 27, 2008, at 10:14 AM, "Álvaro Begué" <[EMAIL PROTECTED]>
wrote:
On Fri, Sep 26, 2008 at 9:29 AM, Jason House
<[EMAIL PROTECTED]> wrote:
Sent from my iPhone
On Sep 24, 2008, at 5:16 PM, Jason House
<[EMAIL PROTECTED]>
wrote:
On Sep 24, 2008, at 2:40 PM, Jacques Basaldúa <[EMAIL PROTECTED]
> wrote:
Therefore, the variance of the normal that best approximates the
distribution of both RAVE and
wins/(wins + losses) is the same n·p·(1-p)
See above, it's slightly different.
If this is true, the variance you are measuring from the samples
does not
contain any information
about the precision of the estimators. If someone understands this
better, please explain it to
the list.
This will get covered in my next revision. A proper discussion is
too much
to type with my thumb...
My paper writing time is less than I hadhiped, so here's a quick
and dirty
answer.
For a fixed win rate, the probabilities of a specific number of
wins and
losses follows the binomial distribution. That distribution keeps p
(probability of winning) constant and the number of observed wins
and losses
variable.
When trying to reverse this process, the wins and losses are kept
constant
and p varies. Essentially prob(p=x) is proportional to (x^wins)(1-
x)^losses.
This is a Beta distribution with known mean, mode, variance, etc...
It's
these values which should be used for approximating the win rate
estimator
as a normal distion. Does that makes sense?
I've glossed over a very important detail when "reversing". Bayes
Theorem
requires some extra a priori information. My preferred handling
alters the
reversed equation's exponents a bit but the basic conclusion (of a
beta
distribution) is the same.
Maybe I can say it a little more precisely. Before we have collected
any data, let's use a uniform prior for p. After we sample the move a
number of times, we obtain w wins and l losses. Bayes's theorem tells
us that the posterior probability distribution is a beta distribution
B(w+1,l+1) (see http://en.wikipedia.org/wiki/Beta_distribution for
details).
When I originally posted about this stuff and modifying the UCB
formula, the uniform prior was a major sticking point for people. It
is my preferred handling when no domain knowledge/heuristics are used.
An implication of this is that the expected value of p after w wins
and l losses is (w+1)/(w+l+2). This is the same as initializing w=l=1
before you have any information and then using w/(w+l) as your winning
rate, which some people have done intuitively, but it's clear that
it's not just a kludge. I'll use the letter r for the value of the
winning rate.
The estimate of the variance is (w+1)*(l+1)/((w+l+2)^2*(w+l+3)), which
is r*(1-r)/(w+l+3). The simple UCB formula uses an estimate of the
variance that is simply 1/visits, so perhaps one should modify the
formula by multiplying that estimate by r*(1-r), which means that the
variance is smaller in positions that look like clear victories for
one side. I don't know if this makes any difference in practice, but I
doubt it.
The UCB1-Tuned formula aimed at a similar effect. I know the MoGo team
abandoned it because they didn't see a strength gain, just added
complexity. In fact, they don't even use UCB's at all anymore.
Also, note that when initializing wins and losses to 1, n=wins+losses
and p=wins/n, the variance becomes p(1-p)/(n+1) which is very close to
a nieve binomial distribution of p(1-p)/n with win and loss
initialization.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/