So to sum up we have the following pseudo code :
at a given node :

- find the child (among the visited child only) that maximizes de UCT-RAVE
value

- if this maximum UCT-RAVE value is less than FPU value and if there still
exisits unvisited nodes :

choose one unvisited node

- continue


Is this correct ?


On Fri, Mar 28, 2008 at 2:57 PM, Erik van der Werf <[EMAIL PROTECTED]>
wrote:

> On Fri, Mar 28, 2008 at 2:36 PM, Jaonary Rabarisoa <[EMAIL PROTECTED]>
> wrote:
> > So if I understand, at each node we need to play every possible action
> once
> > at first, even many of these actions are surely non optimal. And this
> may be
> > slow if the number of the possible action at this node is huge.
>
> Well, as discussed in their ICML paper you could also initialize nodes
> with prior knowledge.
>
> > When you talk about FPU, does it mean that you give  a kind of default
> value
> > for unvisited node and compare this value with (1-beta)*Q_uct +
> beta*Q_rave
> > if we can compute it ?
>
> Yes, you do the normal UCT-RAVE selection for the moves that have been
> already been explored at least once, then if the highest upper
> confidence bound (from the move you would normally select if there are
> no unexplored nodes) does not exceed the FPU value you select an
> unexplored node (FPU=infinity gives standard UCT).
>
> Erik
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to