Quoting Sylvain Gelly <sylvain.ge...@m4x.org>:
On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch <i...@gmx.ch> wrote:
And a final question: You calculate the (beta) coefficient as
c = rc / (rc+c+rc*c*BIAS);
which looks similar to the formula proposed by David Silver (If I recall
his name correctly). However, in his formula, the last term looks like
rc*c*BIAS/(q_ur*(1-q_ur))
Is it correct that we could get q_ur from the current UCT-RAVE mean value,
and that it is used like that?
Yes the formula looks very similar (David proposed that formula to me in the
beginning of 2007). However my implementation did not contain
the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5
so the factor=0.25.
I did not try the other formula, maybe it works better in practice, while I
would expect it is similar in practice.
Valkyria uses an even more complicated version of what David Silver
proposed (I really did not understand it so I came up with something
that looked plausible to me that actually estimated the bias for each
candidate move rather than assuming it constant).
When Sylvain proposed this simple version I tested that version
against my own interpretation. On 9x9 my complicated version might
have a win rate 3% better than the simple version for 3 data points
(700 games each) near the maximum. The standard error according to
twogtp is 1.9.
On 19x19 I got results where there no difference at all but with much
higher uncertainty because there was not many games played.
But the term is important for sure, the constant BIAS used should be
larger than 0 but you should be careful to not set it too high. For
Valkyria the 0.015 value Sylvain posted here worked fine. But if it is
higher for example 0.15 leads to bad performance on 9x9 and
catastrophic performance on 19x19. Initially I thought this parameter
should be something like 1 and that was completely wrong. I was just
lucky to get it right, just by visual inspection of the search
behavior when I played around with the parameter.
The reason is that the bias term of the denominator should be close to
zero initially is to allow AMAF to have strong impact on the search
but at some point (which is delayed by having a small BIAS constant)
there is a quick shift towards using the real win rate instead.
-Magnus
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/