I use FPU for both values for precisely the reasons you describe.
Sent from my iPhone
On Mar 28, 2008, at 9:36 AM, "Jaonary Rabarisoa" <[EMAIL PROTECTED]>
wrote:
So if I understand, at each node we need to play every possible
action once at first, even many of these actions are surely non
optimal. And this may be slow if the number of the possible action
at this node is huge.
When you talk about FPU, does it mean that you give a kind of
default value for unvisited node and compare this value with (1-beta)
*Q_uct + beta*Q_rave if we can compute it ?
Finally does it make sense to throw away all non visited node and
only consider the node that have a rave value first. Precisely use
the FPU but only for unvisited node that have Q_rave value.
On Fri, Mar 28, 2008 at 12:41 PM, Jason House <[EMAIL PROTECTED]
> wrote:
On Fri, 2008-03-28 at 11:20 +0100, Jaonary Rabarisoa wrote:
> - its rave and uct value are defined ( in this case we can
> compute the above score)
> - only the rave value is defined (in this situation the n
(s,a)
> = 0 and the uct value is not defined)
> - neiher rave nor uct value is defined
>
> So my question is how they handle these case when they traverse the
> tree ? Because their score are not always defined for every childs
of
> a node.
I handled this by using FPU values. FPU = first play urgency, the
value
to initially assign to unvisited nodes. Other Gelley papers
recommended
a value of 1.1
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/