On Fri, Sep 26, 2008 at 9:29 AM, Jason House
<[EMAIL PROTECTED]> wrote:
>
>
> Sent from my iPhone
>
> On Sep 24, 2008, at 5:16 PM, Jason House <[EMAIL PROTECTED]>
> wrote:
>
>> On Sep 24, 2008, at 2:40 PM, Jacques Basaldúa <[EMAIL PROTECTED]> wrote:
>>>
>>
>>> Therefore, the variance of the normal that best approximates the
>>> distribution of both RAVE and
>>> wins/(wins + losses) is the same n·p·(1-p)
>>
>> See above, it's slightly different.
>>
>>
>>> If this is true, the variance you are measuring from the samples does not
>>> contain any information
>>> about the precision of the estimators. If someone understands this
>>> better, please explain it to
>>> the list.
>>
>> This will get covered in my next revision. A proper discussion is too much
>> to type with my thumb...
>
> My paper writing time is less than I hadhiped, so here's a quick and dirty
> answer.
>
> For a fixed win rate, the probabilities of a specific number of wins and
> losses follows the binomial distribution. That distribution keeps p
> (probability of winning) constant and the number of observed wins and losses
> variable.
>
> When trying to reverse this process, the wins and losses are kept constant
> and p varies. Essentially prob(p=x) is proportional to (x^wins)(1-x)^losses.
>
> This is a Beta distribution with known mean, mode, variance, etc... It's
> these values which should be used for approximating the win rate estimator
> as a normal distion. Does that makes sense?
>
> I've glossed over a very important detail when "reversing". Bayes Theorem
> requires some extra a priori information. My preferred handling alters the
> reversed equation's exponents a bit but the basic conclusion (of a beta
> distribution) is the same.

Maybe I can say it a little more precisely. Before we have collected
any data, let's use a uniform prior for p. After we sample the move a
number of times, we obtain w wins and l losses. Bayes's theorem tells
us that the posterior probability distribution is a beta distribution
B(w+1,l+1) (see http://en.wikipedia.org/wiki/Beta_distribution for
details).

An implication of this is that the expected value of p after w wins
and l losses is (w+1)/(w+l+2). This is the same as initializing w=l=1
before you have any information and then using w/(w+l) as your winning
rate, which some people have done intuitively, but it's clear that
it's not just a kludge. I'll use the letter r for the value of the
winning rate.

The estimate of the variance is (w+1)*(l+1)/((w+l+2)^2*(w+l+3)), which
is r*(1-r)/(w+l+3). The simple UCB formula uses an estimate of the
variance that is simply 1/visits, so perhaps one should modify the
formula by multiplying that estimate by r*(1-r), which means that the
variance is smaller in positions that look like clear victories for
one side. I don't know if this makes any difference in practice, but I
doubt it.

Álvaro.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to