On Tue, Apr 28, 2009 at 10:50 AM, Rémi Coulom
<remi.cou...@univ-lille3.fr> wrote:
> Don Dailey wrote:
>>
>> I don't quite understand this.   If I try move m0, which we will assume is
>> the only winning move,  but on the first simulation it turns out to lose,
>> then from what you seem to be saying I would never try it again?
>
> Brian's theorem needs one more hypothesis to be correct: it is necessary to
> overestimate moves. For instance with W wins out of N, estimate the strength
> of the move with (W+1)/(N+1). Otherwise, as you suggest, if a winning move
> is unlucky in the beginning, and another move gets at least a victory, then
> the unlucky winning move would never be searched again.

There is some mathematical argument that the correct estimate for the
winning probability is (W+1)/(N+2). If your prior for the winning
probability is flat, after W wins and L losses the posterior
probability follows a beta distribution with alpha=W+1 and beta=L+1
(see http://en.wikipedia.org/wiki/Beta_distribution), whose average is
alpha/(alpha+beta) = (W+1)/(W+L+2) = (W+1)/(N+2).

Of course this sort of breaks down if the winning probability changes
over time, which in UCT is precisely the case. It's a reasonable place
to start, though, and for small values of W and N is a good
approximation (and those are the only cases where the precise formula
you use makes a difference).

I guess for c==0 to make any sense, you need to initialize W and N
with some optimistic values for every move. Then you will give moves
some more chances even if the first few playouts result in losses.

Álvaro.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to