Re: [computer-go] Monte-Carlo Simulation Balancing

Rémi Coulom Thu, 30 Apr 2009 00:18:17 -0700

David Silver wrote:

2. Run another N simulations, average the value of psi(s,a) overall positions and moves in games that black won (call this g)


This is strange: you do not take lost playouts into consideration.

I believe there is a problem with your estimation of the gradient.Suppose for instance that you count z = +1 for a win, and z = -1 for aloss. Then you would take lost playouts into consideration. This makesme a little suspicious.

The fundamental problem here may be that your estimate of the gradientis biased by the playout policy. You should probably sample X(s)uniformly at random to have an unbiased estimator. Maybe this can befixed with importance sampling, and then you may get a formula that issymmetrical regarding wins and losses. I don't have time to do it now,but it may be worth taking a look.


Rémi
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

Reply via email to