[computer-go] Re: RAVE formula of David Silver (reposted)

David Silver Fri, 28 Nov 2008 15:04:08 -0800

This document is confusing, but here is my interpretation of it. And
it works well for Valkyria. I would really want to see a pseudocode
version of it. I might post the code I use for Valkyria, but it is
probably not the same thing so I would probably just increase the
confusion if I did...


The virtual win-visits (which I think you meant and not 'win/loss')
ratios *are* what is computed in Equation 12. Equation 13 is "standard
UCT". You use equation 14 instead of equation 13 to select the move to
search. For moves that are searched a lot Eq14 will finally approach
Eq13, since Beta should go towards 0.

Thanks for clarifying this Magnus.

In fact the first idea (combining the values of UCT and RAVE, eqns1-11) is reasonably well justified, and works well in practice. Thesecond idea (combining the upper confidence bounds, eqns 12-14) is notso well justified, and I believe several people have found better waysto do this.

I think the term RAVE is often used in a confusing manner. Sometimes
it just means AMAF or as I prefer virtual win-visit ratios, and
sometimes RAVE seems to be that the algorithm that mixes the AMAF
values with normal UCT-values as described in the PDF.

I'd like to suggest the following use of terminology:

AMAF: the idea of estimating the value of an immediate move, from thevalue of that move played at any time.RAVE: the idea of building a tree of AMAF values, one for eachposition and move in the tree.UCT-RAVE: the idea of combining a RAVE value with a UCT value, foreach position and move in the tree.

The way I think of it, RAVE is to AMAF as UCT is to UCB.
-Dave

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Re: RAVE formula of David Silver (reposted)

Reply via email to