> http://ewh.ieee.org/cmte/cis/mtsc/ieeecis/tutorial2007/Bruno_Bouzy_2007.pdf > > Page 89, "which kind of outcome". This method is better than the above > and similar to what Jonas seems to propose. The improvement is minor.
By looking at their proposal (45 * win + score), in contrast to mine, there is no scaling with 1/n. Yet they get an improvement, with something in fact of the order 1/\sqrt{n}. But I think the latter does not play a role. Why ? First hypothesis: misleading data. Second hypothesis: better evaluation function, if we had the ``exact'' expectation of the evaluation function under the MC runs. Here is an interesting argument towards the following: - Even with many simulations, it is likely that all these simulations are different by move 50. Suppose we had an oracle who, after 50 moves in the MC simulation, told if the game is a win or a loss under perfect play. Stopping the simulation here and taking this result would probably be a better evaluation function (discussion below). We could then wonder how best to approach this evaluation function by what we know, that is the score at the end of the MC runs. Then taking the same value for a half point victory or a 60 points victory is ludicrous: MC is not perfect play, we have an uncertainty depending on how far we are in the game and on the position. But a 60 points victory would probably mean that we had a victory after the 50 first moves of the simulation. Under this hypothesis, the evaluation of the node should be \int_{-score}^{\infty} Phi(x) dx, where Phi(x) is the probability that the MC score is off the perfect play by (+x) points. - Now this Phi could be evaluated by looking at typical results of many simulations from the same position (assuming the mean or the median is perfection); alternatively, it could be practically evaluated from the simulations in the node before. - BUT this evaluation function would not be efficient: I have not access to any data, but I think that the MC simulations must have a big variance. That would give too narrow a margin to victory, I guess. - Why should this evaluation with oracle be better ? Well, that's the true minimax evaluation of the position from move 50, same as counting at the end. What we have inbetween is noise. Making better playouts can result in worse play because it is more deterministic, so that the evaluation is less robust (could always make the same mistake), but here it IS robust: we have said perfect play, no mistake. If we had that at the root of the tree, that would be God playing. However, as I said, we arrive at a hardly tenable conclusion for the practical evaluation function. Why? Probably, the RIGHT evaluation function is not victory or defeat, it's: how likely is the program to win from this position (say against itself) ? Indeed, since we have random players, it even has a precise meaning. Hence, non-integer values could make sense. Now, since programs with tree search play better than their MC simulations, we would have less variance in the outcome from this infamous move 50, and hence, we would also have a convolution. We would then not use 0-1, but not anything too broad either. Something like (45 * win + score) is reasonable, around 0 in particular (it should saturate for high scores). - In any case, this would also mean that the bonus for the win should be made higher at yose. Indeed, the nearer the end, the more accurate the MC scoring is (for one simulation) and hence the less we must convolve. And this is not having a spoilt child. That's explicitely trying to imitate a better evaluation function, that we cannot know directly. Jonas _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/