David Silver wrote:
    2. Run another N simulations, average the value of psi(s,a) over 
all positions and moves in games that black won (call this g) 
This is strange: you do not take lost playouts into consideration.

I believe there is a problem with your estimation of the gradient. Suppose for instance that you count z = +1 for a win, and z = -1 for a loss. Then you would take lost playouts into consideration. This makes me a little suspicious.
The fundamental problem here may be that your estimate of the gradient 
is biased by the playout policy. You should probably sample X(s) 
uniformly at random to have an unbiased estimator. Maybe this can be 
fixed with importance sampling, and then you may get a formula that is 
symmetrical regarding wins and losses. I don't have time to do it now, 
but it may be worth taking a look.
Rémi
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to