Hi David,
On Sat, Feb 16, 2008 at 7:07 PM, David Silver <[EMAIL PROTECTED]> wrote:
> Yes, but why add upper confidence bounds to the rave values at all? If
> they really go down that fast, does it make much of a difference?
>
> According to the recent experiments in MoGo, you are right :-) Howeve
Good catch Yamato. I think the idea is that they're trying to calculate the
true variances rather than the sample variances. It's true that q_ur would
probably give a better estimate than q_u or q_r alone. Of course, q_ur
depends on beta, and as they calculate it, beta depends on q_ur.
It may b
David Silver wrote:
>There are two differences between your suggestion and the original
>formula, so I'll try and address both:
>
>1. Your formula gives the variance of a single simulation, with
>probability value_u. But the more simulations you see, the more you
>reduce the uncertainty, so y
I am very confused about the new UCT-RAVE formula.
The equation 9 seems to mean:
variance_u = value_ur * (1 - value_ur) / n.
Is it wrong? If correct, why is it the variance?
I think that the variance of the UCT should be:
variance_u = value_u * (1 - value_u).
Hi Yamato,
There are two differe
David Silver wrote:
>BTW if anyone just wants the formula, and doesn't care about the
>derivation - then just use equations 11-14.
Yes, I just want to use the formula.
But I don't know what the "bias" is...
How can I get the value of br?
Sorry for the slow reply...
The simplest answer is that t
Hi Erik,
Thanks for the thought-provoking response!
Yes, but why add upper confidence bounds to the rave values at all? If
they really go down that fast, does it make much of a difference?
According to the recent experiments in MoGo, you are right :-)
However, I've seen slightly different resul
I am very confused about the new UCT-RAVE formula.
The equation 9 seems to mean:
variance_u = value_ur * (1 - value_ur) / n.
Is it wrong? If correct, why is it the variance?
I think that the variance of the UCT should be:
variance_u = value_u * (1 - value_u).
Why cannot we use that?
Anyway, c
Hi David,
On Fri, Feb 8, 2008 at 6:09 PM, David Silver <[EMAIL PROTECTED]> wrote:
> > Note as well that the current implementation of MoGo (not the one at
> > the time of the ICML paper) use a different tradeoff between UCT and
> > Rave value, thanks to an idea of David Silver, which brought
>
Why are m and n different? Isn't every playout used both to update the UCT
win rate and the RAVE values for the same nodes? Won't the number of UCT
simulations and the number of RAVE simulations be the same?
Each playout is used both to update the UCT win rate and the RAVE
values for the same
f Of David Silver
Sent: Friday, February 08, 2008 3:40 PM
To: computer-go@computer-go.org
Subject: [computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol
43, Issue 8)
Hi Jason,
The original paper's formula for beta always felt wrong to me. I like this
new one a lot better.
Good!
David Silver wrote:
>BTW if anyone just wants the formula, and doesn't care about the
>derivation - then just use equations 11-14.
Yes, I just want to use the formula.
But I don't know what the "bias" is...
How can I get the value of br?
By the way I currently use this formula.
beta = 1 - log(
On Fri, 2008-02-08 at 16:39 -0700, David Silver wrote:
> 2. No, the assumption itself is not correct. The true value of a node
> in the tree is 0 or 1, given perfect play. So the UCT value (which
> just averages the outcomes of simulations) is significantly biased.
Who can predict perfect play?
12 matches
Mail list logo