Re: [computer-go] Monte-Carlo Simulation Balancing

Rémi Coulom Thu, 30 Apr 2009 01:04:01 -0700

Rémi Coulom wrote:

The fundamental problem here may be that your estimate of the gradientis biased by the playout policy. You should probably sample X(s)uniformly at random to have an unbiased estimator. Maybe this can befixed with importance sampling, and then you may get a formula that issymmetrical regarding wins and losses. I don't have time to do it now,but it may be worth taking a look.
Rémi

More precisely: you should estimate the value of N playouts as Sum p_iz_i / Sum p_i instead of Sum z_i. Then, take the gradient of Sum p_i z_i/ Sum p_i. This would be better.


Maybe Sum p_i z_i / Sum p_i would be better for MCTS, too ?

Rémi
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

Reply via email to