So to sum up we have the following pseudo code : at a given node : - find the child (among the visited child only) that maximizes de UCT-RAVE value
- if this maximum UCT-RAVE value is less than FPU value and if there still exisits unvisited nodes : choose one unvisited node - continue Is this correct ? On Fri, Mar 28, 2008 at 2:57 PM, Erik van der Werf <[EMAIL PROTECTED]> wrote: > On Fri, Mar 28, 2008 at 2:36 PM, Jaonary Rabarisoa <[EMAIL PROTECTED]> > wrote: > > So if I understand, at each node we need to play every possible action > once > > at first, even many of these actions are surely non optimal. And this > may be > > slow if the number of the possible action at this node is huge. > > Well, as discussed in their ICML paper you could also initialize nodes > with prior knowledge. > > > When you talk about FPU, does it mean that you give a kind of default > value > > for unvisited node and compare this value with (1-beta)*Q_uct + > beta*Q_rave > > if we can compute it ? > > Yes, you do the normal UCT-RAVE selection for the moves that have been > already been explored at least once, then if the highest upper > confidence bound (from the move you would normally select if there are > no unexplored nodes) does not exceed the FPU value you select an > unexplored node (FPU=infinity gives standard UCT). > > Erik > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ >
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/