So if I understand, at each node we need to play every possible action once
at first, even many of these actions are surely non optimal. And this may be
slow if the number of the possible action at this node is huge.
When you talk about FPU, does it mean that you give  a kind of default value
for unvisited node and compare this value with (1-beta)*Q_uct + beta*Q_rave
if we can compute it ?

Finally does it make sense to throw away all non visited node and only
consider the node that have a rave value first. Precisely use the FPU but
only for unvisited node that have Q_rave value.


On Fri, Mar 28, 2008 at 12:41 PM, Jason House <[EMAIL PROTECTED]>
wrote:

>
> On Fri, 2008-03-28 at 11:20 +0100, Jaonary Rabarisoa wrote:
>
> >         - its rave and uct value are defined ( in this case we can
> >         compute the above score)
> >         - only the rave value is defined (in this situation the n(s,a)
> >         = 0 and the uct value is not defined)
> >         - neiher rave nor uct value is defined
> >
> > So my question is how they handle these case when they traverse the
> > tree ? Because their score are not always defined for every childs of
> > a node.
>
>
> I handled this by using FPU values.  FPU = first play urgency, the value
> to initially assign to unvisited nodes.  Other Gelley papers recommended
> a value of 1.1
>
> >
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to