On Tue, Apr 28, 2009 at 10:50 AM, Rémi Coulom <remi.cou...@univ-lille3.fr> wrote: > Don Dailey wrote: >> >> I don't quite understand this. If I try move m0, which we will assume is >> the only winning move, but on the first simulation it turns out to lose, >> then from what you seem to be saying I would never try it again? > > Brian's theorem needs one more hypothesis to be correct: it is necessary to > overestimate moves. For instance with W wins out of N, estimate the strength > of the move with (W+1)/(N+1). Otherwise, as you suggest, if a winning move > is unlucky in the beginning, and another move gets at least a victory, then > the unlucky winning move would never be searched again.
There is some mathematical argument that the correct estimate for the winning probability is (W+1)/(N+2). If your prior for the winning probability is flat, after W wins and L losses the posterior probability follows a beta distribution with alpha=W+1 and beta=L+1 (see http://en.wikipedia.org/wiki/Beta_distribution), whose average is alpha/(alpha+beta) = (W+1)/(W+L+2) = (W+1)/(N+2). Of course this sort of breaks down if the winning probability changes over time, which in UCT is precisely the case. It's a reasonable place to start, though, and for small values of W and N is a good approximation (and those are the only cases where the precise formula you use makes a difference). I guess for c==0 to make any sense, you need to initialize W and N with some optimistic values for every move. Then you will give moves some more chances even if the first few playouts result in losses. Álvaro. _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/