> "The approach of this paper is to treat all win rate estimations as independent estimators with
additive white Gaussian noise. "

Have you tried if that works? (As Łukasz Lew wrote "experimental setup would be useful") I guess there may be a flaw in your idea, but I am not a specialist. I will try to explain it.

If it wasn't for the fact that the tree is learning, the probability of a playout through a node to win would be constant each time the node is visited. This is, of course, a simplification because the tree does learn, but, at least between playouts that are not very distant in time, it is true. So my argument holds to some (I guess, much) extent. The same applies to the RAVE estimator which is also the result of counting wins (assume P(win|that move) = constant) and dividing by some appropriate sample size. Therefore, these estimators follow a binomial distribution. It does converge to the normal, but with some fundamental caveat: Unlike the normal in which mean an variance are independent, in this case
the variance is a function of p.

The variance of the binomial = n·p·(1-p) is a _function of p_.

Therefore, the variance of the normal that best approximates the distribution of both RAVE and
wins/(wins + losses) is the same n·p·(1-p)

If this is true, the variance you are measuring from the samples does not contain any information about the precision of the estimators. If someone understands this better, please explain it to
the list.

Jacques.*
*
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to