Re: [computer-go] How to "properly" implement RAVE?

Magnus Persson Fri, 30 Jan 2009 00:29:47 -0800

Quoting Sylvain Gelly <sylvain.ge...@m4x.org>:

On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch <i...@gmx.ch> wrote:

And a final question: You calculate the (beta) coefficient as
c = rc / (rc+c+rc*c*BIAS);
which looks similar to the formula proposed by David Silver (If I recall
his name correctly). However, in his formula, the last term looks like
rc*c*BIAS/(q_ur*(1-q_ur))
Is it correct that we could get q_ur from the current UCT-RAVE mean value,
and that it is used like that?



Yes the formula looks very similar (David proposed that formula to me in the
beginning of 2007). However my implementation did not contain
the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5
so the factor=0.25.
I did not try the other formula, maybe it works better in practice, while I
would expect it is similar in practice.

Valkyria uses an even more complicated version of what David Silverproposed (I really did not understand it so I came up with somethingthat looked plausible to me that actually estimated the bias for eachcandidate move rather than assuming it constant).

When Sylvain proposed this simple version I tested that versionagainst my own interpretation. On 9x9 my complicated version mighthave a win rate 3% better than the simple version for 3 data points(700 games each) near the maximum. The standard error according totwogtp is 1.9.

On 19x19 I got results where there no difference at all but with muchhigher uncertainty because there was not many games played.

But the term is important for sure, the constant BIAS used should belarger than 0 but you should be careful to not set it too high. ForValkyria the 0.015 value Sylvain posted here worked fine. But if it ishigher for example 0.15 leads to bad performance on 9x9 andcatastrophic performance on 19x19. Initially I thought this parametershould be something like 1 and that was completely wrong. I was justlucky to get it right, just by visual inspection of the searchbehavior when I played around with the parameter.

The reason is that the bias term of the denominator should be close tozero initially is to allow AMAF to have strong impact on the searchbut at some point (which is delayed by having a small BIAS constant)there is a quick shift towards using the real win rate instead.


-Magnus


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to "properly" implement RAVE?

Reply via email to