Re: [computer-go] Simplified MC eva luator ¿explained?

Jason House Sat, 07 Apr 2007 08:20:16 -0700

Jacques Basaldúa wrote:

Daniel Liu wrote:
An imperfect evaluation has errors. Is the exact value of the errorknown? No.
I have an idea on that I will try to explain:

Given any finite combinatorial game where the ending nodes
have two possible values: win/loss, any node has a "winning rate" (Iignore if there is a better name for that.) defined
as: (# of subpaths ending in win)/(# of total subpaths)
Let's call it W.

(This winning rate can be a simplified version of what
Erik calls "underlying ground-truths".)

Assuming two simplifying hypotheses:

1. The playouts are uniformly random.
2. Both players have the same number of legal moves (or any unbalancednumbers compensate in the long term).
The unknown probability p of wining a random playout is
a function of W.

The _observed_ proportion after n random experiments, p-hat
is an _unbiased_ estimator of p whose confidence intervals
are the intervals for a binomial proportion as stated earlier
in this list.

Abusing of language, we can call p and estimator of W. (It is
not really an estimator because it cannot be computed directly.)

Now, the most interesting part: p is a _biased_ "estimator" of W
and it is biased towards 1/2 as long as the expected value of thenoise is zero (= playouts are not "biased"). The higher the
noise (or _the longer the playout_) the more biased it is.

In short:
1. We measure p-hat which is an unbiased estimator of p with
known error distribution.
2. p is a biased towards 1/2 estimator of W. Knowing the variance (orreplacing it by an estimator measured from experiments), we can modelthe bias.
Simplifying to understand why it is biased toward 1/2
add random noise distributed as N(0, e) to p.

With small noise:  p + N(0, 0.01) gives very similar results to p
With big noise: p + N(0, 100) gives very similar results to 1/2 forany p in [0,1]
This is a rather theoretical post of small practical use, but
it helps explaining the effect of the longer playouts => highervariance => more biass towards 1/2.
Jacques.

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

I don't understand your post.
bias = E(p_hat) - p = E(p+N) - p = E(p)+E(N) - p = p + 0 - p = 0

I'm thinking that maybe you're clipping p+N to [0,1]? Maybe my biggestconfusion is how you're actually arriving at p+N in a meaningful way. Asingle MC playout corresponds to a bernouli trial with probability p.Even with many trials, the noise becomes binomial, and asymptoticallyapproaches the normal distribution.

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Simplified MC eva luator ¿explained?

Reply via email to