Hi there
I am new here, but have read the list for a few monthes.
I am a mathematician, finishing my PhD on quantum statistics (that is
statistics on quantum objects, quantum information, etc.).
So do not expect me to write any code, but I could have suggestions for
heuristics in the choice of moves and so on.
I'll speak more about one of them in the coming days.
I shall just begin with a short reflexion upon the evaluation function,
starting from the Don's fear that we might have to curb too powerful and
pessimistic programs.
> The funny thing about MC scoring is that if white DOES win, then it
> doesn't matter what black plays, everything loses!
>
> That would mean that in a scalability study, where colors are
> alternated, really strong versions of Mogo would score poorly against
> the weaker programs.
>
> In a serious competition you would want to throttle down the playing
> strength (when playing black) so that it could win more and not just
> quit (resign) out of frustration!
There was the suggestion of
something like tanh(K*score), instead of 0-1 scoring.
A similar system would make some sense, without loss of strength, if we
scale K like 1/n, where n is the number of simulations. A consequence
would be that programs would choose among 100% victory or defeat moves
according to the score they expect.
Let us analyze all this more precisely.
First, I consider that the "true" evaluation function E of the node is
the
exact probability of winning by using the MC simulations. In other
words,
that's the value we would get with infinitely many runs and 1-0 scoring,
and that's what I want to approximate as precisely as possible with my
finite number of runs.
Second I evaluate here only at one node, without growth of the tree.
Given the conclusions, there should not be much difference. If the
conclusion had been: we can do much better than 0-1, that would have
been quite another matter.
The 1-0 evalutation function has convergence in 1/\sqrt{n}, impossible
to beat if there is sufficient irregularity in the expectation of each
result.
Now, let us write p_i for the probability that the score of the MC
simulation is i. So that i=0.5, -3.5, or 7.5 for example.
We have \sum p_i = 1, and our evaluation function is
E = \sum_i p_i sign(i)
(Well that is rather counting -1 for a defeat, but that's the same and
is more symmetrical, so nicer for presentations of the following
calculations).
We observe
\hat{p}_i = n_i / n
where n_i is the number of our simulations that give result i, on the n
simulations we have carried out.
0-1 evaluation corresponds to estimate E by
\hat{E} = \sum_i \hat{p}_i sign(i).
Now the p_i are heavily correlated (**), so that I am convinced that I
could
get better results than 0-1 for low number of simulations. For high
number of simulations, certainly not (the autocorrelation would
dominate).
Now these methods (probably using low-band filters on the array of
\hat{p}_i) would probably be much slower than running a few more
simulations...
If we keep to something simple, we could use functions of the form
\hat{E}_{\lambda} = \sum_i \hat{p}_i sign(i) (1 - \lambda_i).
This includes the suggestion of using tan(K i)...
Notice that 0-1 corresponds to taking lambda_i = 0.
Let us compute the expected mean square error (MSE) = Bias^2 + Variance:
Bias :
B = - E + \sum_i p_i sign(i) (1 - \lambda_i)
= - \sum_i p_i sign(i) \lambda_i
This is zero with the 0-1 evaluation function.
Variance :
V = (1/n) (\sum_i p_i (1 - \lambda(i))^2 - (E + B)^2)
We are only interested in the difference with the 0-1 case, so I do not
develop further the variance. For 0-1, the variance would be 1/n (1 -
E^2).
MSE(\lambda) - MSE(0-1) =
B^2 (1 - 1/n) - 2 E*B / n + (1/n) [ \sum_i p_i (\lambda(i)^2 - 2 \lambda_i)]
So that if we want this value to be negative, showing better results
than 0-1, we first need the bias to be of order 1/n.
The only way to achieve this with pre-chosen p_i is to take (\lambda_i _
1) of order 1/n.
The change would then be only cosmetic: this means changing at most a
fixed number of outcomes when n goes to infinity !
Then the first order (that is 1/n^2) can be rewritten:
B^2 + (2/n) (\sum_i p_i \lambda(i) [E sgn(i) - 1] )
If we restrict to the cases where \lambda(i) = \lambda(-i), then we see
that the positive contributions imply (products of) terms or the form:
(p_i - p_{-i}) \lambda(i)
whereas the negative contribution implies
(p_i + p_{-i}) \lambda(i)
As we want to be as negative as possible, we could get a small
improvement if \lambda_i is bigger when p_i is near p_{-i}.
A priori, this should be the case for small i: if there are high chances
that final score is -.5, there are high chances that it is 1.5. No link
between 20.5 and -19.5 a priori, on the other hand...
We have to keep \sum \lambda_i very low, though, otherwise the squared bias
gets dominant.
*** CONCLUSION ***
If we restrict our attention to scheme where we add a value to the
node after the sim