Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Erik van der Werf Wed, 06 Feb 2008 15:30:52 -0800

Hi Hideki,

Your results look similar to those of Mogo as reported in their icml
paper. When you ran this experiment, did you use anything like FPU or
progressive widening, or did you use Levente's original design which
always selects unvisited moves first?


Regards,
Erik


On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato <[EMAIL PROTECTED]> wrote:
> I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9,
>  komi 7.5, 3000 playouts/move, 2000 games match:
>
>  Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
>  With RAVE:      winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)
>
>  Though this includes some other improvements, most come from RAVE.
>  Unlike MoGo, my best 'K' was 1000.
>
>  Following is my implementation of RAVE for GGMC v2r6.
>  1) Each playout returns the score and all moves with colors played.
>  2) While back-propagating the value (degitized score), computes the
>  mean and the variance according to UCB1 and do the same for RAVE
>  seperatelly.  For RAVE, the values of all (legal) moves, except played
>  one, in a node are updated.
>  3) In the computation of values for RAVE, the point is that there
>  appeares three colors (as someone, I remember GCP, mentioned before).
>  If the players' colors aren't the same then skip.  Count the value as
>  is or negate (1 - score, for me), depending on the color of the player
>  at the position and the color for the score.
>  4) Before back-propagating the value of each playout, I setup a color
>  table for all intersections of the board for speed-up, in fact
>  (initialized with EMPTY). That is, fill the board (table[move] =
>  color) by tracing the moves and the colors returned by the playout
>  forward (from leaf node to end of the game). Then, by tracing the
>  path from root to the leaf node, clear the table[move] (table[move] =
>  EMPTY), in order to avoid duplicate counting with UCB1.
>  5) While descending the tree, merge the values come from UCB1 and
>  RAVE with 'K' according to the formula in the paper.
>
>  #Though I'm writing this by reading my source code, this description
>  may include some errors.
>
>  Hope this helps,
>
> Hideki
>
>  Gian-Carlo Pascutto: <[EMAIL PROTECTED]>:
>
> >> I also implemented RAVE in Mango. There was a few points of improvements
>  >> (around 60 Elo points with gnugo as reference), but as much as in the
>  >> paper of Gelly and Silver :( (around 250 Elo points if I remember well)
>  >>
>  >> It might be that the effect of RAVE depends a lot on the simulation
>  >> strategy. Indeed, sometimes my RAVE was playing very good moves but also
>  >> very bad ones.
>  >
>  >I don't think the simulation strategy is the key.
>  >
>  >I suspect the improvement is largest when you don't do progressive widening.
>  >
>  >Nevertheless it would be quite interesting to see the implementation
>  >details of ggmc's RAVE. RAVE performance is quite dependent on exact
>  >implementation and parameters.
>  --
>
> [EMAIL PROTECTED] (Kato)
>  _______________________________________________
>
>
> computer-go mailing list
>  computer-go@computer-go.org
>  http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Reply via email to