I will try to explain it better: What had puzzled me until recently is how to combine two facts:
a. As Jason says, "A single MC playout corresponds to a Bernoulli trial with probability p" and therefore, p-hat is distributed as a binomial which converges to the normal. b. The playout itself is a stochastic process which has a variance. That variance depends on the length of the playout. These two _facts_ seem contradictory and it is not obvious how to introduce the variance (which is crucial in any MC or even more generally in any stochastic evaluator) in our model. The variance of the stochastic process is not to be mixed up with the distribution of the error of a repeated Bernoulli experiment! The explanation "a", by itself, does not contemplate the variance of the stochastic process. Therefore, I propose a different experiment: Take a Bernoulli experiment of probability: p + N(0, e) (N is some noise of expected value = 0 and standard deviation = e. Of course, trim p + N(0, e) to the interval [0, 1].) For very small e, e.g. e = 1/100, the result of the experiment of p = .7 + N(0, e) is almost the same as the experiment without noise p = .7 For very big e the result of p + N(0, e) is the same as the result for 1/2 That's because the noise is bigger than 1 most of the time, so p + N is 0 or 1 most of the time. In other words: the probability of winning under very noisy conditions does no longer depend much on the initial chances, (the probability conditioned by a move), but it depends on the stochastic process (the playout). And because the playout is balanced its expected value is 1/2. That's why the more important the variance, the more p is biased towards 1/2. Example: If the game could end in a 4 move long playout, and the first move is studied: The probability of a win given a first "good" move is significantly bigger than the probability after a "bad" move. But if the playout is 1000 moves long, the probability of winning after a first "good" move is almost the same as after a first "bad" move -> 1/2. The p + N(0, e) model represents MC much better than the p model. And then I go one step further: When the playouts are uniformly random (=each legal move has the same probability of being selected) and the number of legal moves for each side is balanced, then: Our MC evaluator is a measure of a very trivial rate each node has: the number of sub paths ending in win / the number of all subpaths through that node. I call this quantity the W of a node. W always exists and, if the tree could be searched to its end, it could be computed directly. Because the search cannot be completed it can be estimated by the MC playout. In other words: An (uniformly distributed) MC playout is nothing else but _an estimator of W_. And it is a biased estimator for the reasons explained above. Again: p-hat (= #win / #playouts) is an unbiased estimator of p. p is a biased towards 1/2 "estimator" of W W always exists (= #number of subpaths winning / # subpaths) (Excepts in trivial cases, # subpaths >> astronomical =>W cannot be computed directly.) I hope it is clear this time. For me, it makes so much sense that it becomes obvious when understood, but my way of explaining it is surely improvable. ;-) Jacques. _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/