> "The approach of this paper is to treat all win rate estimations as
independent estimators with
additive white Gaussian noise. "
Have you tried if that works? (As Łukasz Lew wrote "experimental setup
would be useful") I guess
there may be a flaw in your idea, but I am not a specialist. I will try
to explain it.
If it wasn't for the fact that the tree is learning, the probability of
a playout through a node to win
would be constant each time the node is visited. This is, of course, a
simplification because the tree
does learn, but, at least between playouts that are not very distant in
time, it is true. So my argument
holds to some (I guess, much) extent. The same applies to the RAVE
estimator which is also the result
of counting wins (assume P(win|that move) = constant) and dividing by
some appropriate sample size.
Therefore, these estimators follow a binomial distribution. It does
converge to the normal, but with
some fundamental caveat: Unlike the normal in which mean an variance are
independent, in this case
the variance is a function of p.
The variance of the binomial = n·p·(1-p) is a _function of p_.
Therefore, the variance of the normal that best approximates the
distribution of both RAVE and
wins/(wins + losses) is the same n·p·(1-p)
If this is true, the variance you are measuring from the samples does
not contain any information
about the precision of the estimators. If someone understands this
better, please explain it to
the list.
Jacques.*
*
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/