Re: [computer-go] Improvement of UCT search algorithm

sylvain . gelly Tue, 10 Oct 2006 09:27:00 -0700

> > Results: (number of win/number of games with MoGo playing black, then
> > with MoGo playing white, then percentage over all the games).
> > * Choosing the move with the highest value: 338/425(b),352/425(w)
> > (81.2%/850) * Choosing the move with the highest (value-(standard
> > deviation)/sqrt(simulations)): 332/400(b),326/400(w) (82.2%/800)
> > * Choosing the move with the highest number of simulations:
> > 322/400(b),341/400 (w) (82.9%/800)
>
> Correct me if i'm wrong.
> UCT explores move m with a highest
> avg_m + c* sqrt ( n / log (n_m) )
no it is:
avg_m + c* sqrt ( log(n) / n_m)


> so those values are kept almost at the same level.
> n is the same for all siblings, so a child with a highest avg_m has
> also highest n_m.

Almost yes, but the point is that as the tree grows, the problem is no more 
stationary, so a move can become very bad because you found the refutation, 
and not found yet the refutation of another move. So the better is not always 
the most visited. Sometimes you can make a bad choosing the move with the 
highest value. 
This is why Don said that it is good to give more time if the move with the 
highest value is not the most visited.


> BTW have someone tried to remove log from the equation?
We've tried modifing the parameters, replacing log(n) by n^alpha, and 
replacing sqrt(.) by (.)^beta, with no significant results. But of course we 
tested inside a limited choice of parameters :).

_______________________________________________
computer-go mailing list
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Improvement of UCT search algorithm

Reply via email to