David Silver wrote:
2. Run another N simulations, average the value of psi(s,a) over
all positions and moves in games that black won (call this g)
This is strange: you do not take lost playouts into consideration.
I believe there is a problem with your estimation of the gradient.
Suppo
Rémi Coulom wrote:
The fundamental problem here may be that your estimate of the gradient
is biased by the playout policy. You should probably sample X(s)
uniformly at random to have an unbiased estimator. Maybe this can be
fixed with importance sampling, and then you may get a formula that is
Today's "Guardian" newspaper has, on the front page of its technology
supplement, an article about recent developments in computer Go. You
can read it at
http://www.guardian.co.uk/technology/2009/apr/30/games-software-mogo
Unlike the last newspaper article on computer Go that I saw, this one i
Hi Remi,
This is strange: you do not take lost playouts into consideration.
I believe there is a problem with your estimation of the gradient.
Suppose for instance that you count z = +1 for a win, and z = -1 for
a loss. Then you would take lost playouts into consideration. This
makes me a
David Silver wrote:
Sorry, I should have made it clear that this assumes that we are
treating black wins as z=1 and white wins as z=0.
In this special case, the gradient is the average of games in which
black won.
But yes, more generally you need to include games won by both sides.
The algori
Hi Remi,
I understood this. What I find strange is that using -1/1 should be
equivalent to using 0/1, but your algorithm behaves differently: it
ignores lost games with 0/1, and uses them with -1/1.
Imagine you add a big constant to z. One million, say. This does not
change the problem. Y
I wish I was smart :(
David Silver wrote:
Hi Remi,
I understood this. What I find strange is that using -1/1 should be
equivalent to using 0/1, but your algorithm behaves differently: it
ignores lost games with 0/1, and uses them with -1/1.
Imagine you add a big constant to z. One millio
Hi Yamato,
Thanks for the detailed explanation.
M, N and alpha are constant numbers, right? What did you set them to?
You're welcome!
Yes, in our experiments they were just constant numbers M=N=100.
The feature vector is the set of patterns you use, with value 1 if a
pattern is matched and
IMO other people's equations/code/ideas/papers always seem smarter
than your own. The stuff you understand and do yourself just seems
like common sense, and the stuff you don't always has a mystical air
of complexity, at least until you understand it too :-)
On 30-Apr-09, at 1:59 PM, Michae
David Silver wrote:
>Yes, in our experiments they were just constant numbers M=N=100.
If M and N are the same, is there any reason to run M simulations and
N simulations separately? What happens if you combine them and calculate
V and g in the single loop?
>Okay, let's continue the example above
10 matches
Mail list logo