>I think someone pointed out a long time ago on this mailing list that
>initializing the prior in terms of Rave simulations was far less efficient
>than initializing the prior in terms of "real" simulations.

You might be recalling an exchange that I had with Sylvain. I asked how
initial bias was implemented in Mogo, and Sylvain replied that either one
will work.

And that is true, but biasing the UCT values is much more forceful. There
are three differences.

First, assigning a win (or loss) to a UCT term is more significant than
assigning to a RAVE term because RAVE observations are vastly more
plentiful.

Second, assigning to UCT causes the upper confidence bounds to start at a
less optimistic level, which wastes fewer trials on pointless exploration.

Third, engines (should) have a policy for flowing UCT scores up the tree
along transpositions. Assigning to RAVE does not exploit that capability.

That being said, there is a caveat: assignments to UCT should be unbiased
estimates of winning percentage. RAVE terms can express other priorities.

For example, Pebbles bias in favor of exploring atari can be as large as 24
wins in 24 trials. The bias varies, depending on the situation, but it is
never smaller than 9 wins in 9 trials. It is clear that the 24/24 bias
(which is given whether winning or not) is not a sensible estimate of
winning chances.

Nevertheless, the bias works because it favorably changes search behavior.
Obviously, you must search atari moves if you don't want to lose. Pebbles'
automated parameter tuning system has pushed that parameter up because
higher values helped it to win games.

I highly recommend reading the Fuego implementation. I believe that it is
largely because of well-judged UCT priors that Fuego plays so efficiently
with small trials. E.g. Fuego-400nodes rated at 1518 on 9x9 CGOS. That's
only 5 trials per empty point!

BTW, Pebbles does not do a good job on this issue. Pebbles uses only RAVE
biases, though I have known since my exchange with Sylvain that it was a
worse choice. Unfortunately, I started on the other implementation, and now
all of the priors are unrelated to winning chances. I need to create a new
system that evaluates move quality, and I haven't gotten around to it yet.


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to