Jacques Basaldúa wrote:
Daniel Liu wrote:
An imperfect evaluation has errors. Is the exact value of the error
known? No.
I have an idea on that I will try to explain:
Given any finite combinatorial game where the ending nodes
have two possible values: win/loss, any node has a "winning rate" (I
ignore if there is a better name for that.) defined
as: (# of subpaths ending in win)/(# of total subpaths)
Let's call it W.
(This winning rate can be a simplified version of what
Erik calls "underlying ground-truths".)
Assuming two simplifying hypotheses:
1. The playouts are uniformly random.
2. Both players have the same number of legal moves (or any unbalanced
numbers compensate in the long term).
The unknown probability p of wining a random playout is
a function of W.
The _observed_ proportion after n random experiments, p-hat
is an _unbiased_ estimator of p whose confidence intervals
are the intervals for a binomial proportion as stated earlier
in this list.
Abusing of language, we can call p and estimator of W. (It is
not really an estimator because it cannot be computed directly.)
Now, the most interesting part: p is a _biased_ "estimator" of W
and it is biased towards 1/2 as long as the expected value of the
noise is zero (= playouts are not "biased"). The higher the
noise (or _the longer the playout_) the more biased it is.
In short:
1. We measure p-hat which is an unbiased estimator of p with
known error distribution.
2. p is a biased towards 1/2 estimator of W. Knowing the variance (or
replacing it by an estimator measured from experiments), we can model
the bias.
Simplifying to understand why it is biased toward 1/2
add random noise distributed as N(0, e) to p.
With small noise: p + N(0, 0.01) gives very similar results to p
With big noise: p + N(0, 100) gives very similar results to 1/2 for
any p in [0,1]
This is a rather theoretical post of small practical use, but
it helps explaining the effect of the longer playouts => higher
variance => more biass towards 1/2.
Jacques.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
I don't understand your post.
bias = E(p_hat) - p = E(p+N) - p = E(p)+E(N) - p = p + 0 - p = 0
I'm thinking that maybe you're clipping p+N to [0,1]? Maybe my biggest
confusion is how you're actually arriving at p+N in a meaningful way. A
single MC playout corresponds to a bernouli trial with probability p.
Even with many trials, the noise becomes binomial, and asymptotically
approaches the normal distribution.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/