Hi there I am new here, but have read the list for a few monthes. I am a mathematician, finishing my PhD on quantum statistics (that is statistics on quantum objects, quantum information, etc.). So do not expect me to write any code, but I could have suggestions for heuristics in the choice of moves and so on.
I'll speak more about one of them in the coming days. I shall just begin with a short reflexion upon the evaluation function, starting from the Don's fear that we might have to curb too powerful and pessimistic programs. > The funny thing about MC scoring is that if white DOES win, then it > doesn't matter what black plays, everything loses! > > That would mean that in a scalability study, where colors are > alternated, really strong versions of Mogo would score poorly against > the weaker programs. > > In a serious competition you would want to throttle down the playing > strength (when playing black) so that it could win more and not just > quit (resign) out of frustration! There was the suggestion of something like tanh(K*score), instead of 0-1 scoring. A similar system would make some sense, without loss of strength, if we scale K like 1/n, where n is the number of simulations. A consequence would be that programs would choose among 100% victory or defeat moves according to the score they expect. Let us analyze all this more precisely. First, I consider that the "true" evaluation function E of the node is the exact probability of winning by using the MC simulations. In other words, that's the value we would get with infinitely many runs and 1-0 scoring, and that's what I want to approximate as precisely as possible with my finite number of runs. Second I evaluate here only at one node, without growth of the tree. Given the conclusions, there should not be much difference. If the conclusion had been: we can do much better than 0-1, that would have been quite another matter. The 1-0 evalutation function has convergence in 1/\sqrt{n}, impossible to beat if there is sufficient irregularity in the expectation of each result. Now, let us write p_i for the probability that the score of the MC simulation is i. So that i=0.5, -3.5, or 7.5 for example. We have \sum p_i = 1, and our evaluation function is E = \sum_i p_i sign(i) (Well that is rather counting -1 for a defeat, but that's the same and is more symmetrical, so nicer for presentations of the following calculations). We observe \hat{p}_i = n_i / n where n_i is the number of our simulations that give result i, on the n simulations we have carried out. 0-1 evaluation corresponds to estimate E by \hat{E} = \sum_i \hat{p}_i sign(i). Now the p_i are heavily correlated (**), so that I am convinced that I could get better results than 0-1 for low number of simulations. For high number of simulations, certainly not (the autocorrelation would dominate). Now these methods (probably using low-band filters on the array of \hat{p}_i) would probably be much slower than running a few more simulations... If we keep to something simple, we could use functions of the form \hat{E}_{\lambda} = \sum_i \hat{p}_i sign(i) (1 - \lambda_i). This includes the suggestion of using tan(K i)... Notice that 0-1 corresponds to taking lambda_i = 0. Let us compute the expected mean square error (MSE) = Bias^2 + Variance: Bias : B = - E + \sum_i p_i sign(i) (1 - \lambda_i) = - \sum_i p_i sign(i) \lambda_i This is zero with the 0-1 evaluation function. Variance : V = (1/n) (\sum_i p_i (1 - \lambda(i))^2 - (E + B)^2) We are only interested in the difference with the 0-1 case, so I do not develop further the variance. For 0-1, the variance would be 1/n (1 - E^2). MSE(\lambda) - MSE(0-1) = B^2 (1 - 1/n) - 2 E*B / n + (1/n) [ \sum_i p_i (\lambda(i)^2 - 2 \lambda_i)] So that if we want this value to be negative, showing better results than 0-1, we first need the bias to be of order 1/n. The only way to achieve this with pre-chosen p_i is to take (\lambda_i _ 1) of order 1/n. The change would then be only cosmetic: this means changing at most a fixed number of outcomes when n goes to infinity ! Then the first order (that is 1/n^2) can be rewritten: B^2 + (2/n) (\sum_i p_i \lambda(i) [E sgn(i) - 1] ) If we restrict to the cases where \lambda(i) = \lambda(-i), then we see that the positive contributions imply (products of) terms or the form: (p_i - p_{-i}) \lambda(i) whereas the negative contribution implies (p_i + p_{-i}) \lambda(i) As we want to be as negative as possible, we could get a small improvement if \lambda_i is bigger when p_i is near p_{-i}. A priori, this should be the case for small i: if there are high chances that final score is -.5, there are high chances that it is 1.5. No link between 20.5 and -19.5 a priori, on the other hand... We have to keep \sum \lambda_i very low, though, otherwise the squared bias gets dominant. ******* CONCLUSION ******* If we restrict our attention to scheme where we add a value to the node after the simulation, WE CANNOT GET (much) BETTER THAN THE USUAL SUMMING OF VICTORIES. There might be ways to get better evaluations, but they are much more complicated and I have no idea of how they would interact with the tree. However, we can, without loss of performance, change the 1 in something like tanh(n * integer_part(2 + |i|)/2), where i is ths score of the simulation. This would be almost equivalent to: first choose 0-1. If no difference, look at scores... ************** Well, complicated for not much. I think/hope my next post (I know which one) will be more interesting. Jonas ----- End forwarded message ----- _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/