Hi there

I am new here, but have read the list for a few monthes.
I am a mathematician, finishing my PhD on quantum statistics (that is
statistics on quantum objects, quantum information, etc.).
So do not expect me to write any code, but I could have suggestions for
heuristics in the choice of moves and so on.

I'll speak more about one of them in the coming days.

I shall just begin with a short reflexion upon the evaluation function,
starting from the Don's fear that we might have to curb too powerful and
pessimistic programs.

> The funny thing about MC scoring is that if white DOES win,  then it
> doesn't matter what black plays, everything loses!
>
> That would mean that in a scalability study,   where colors are
> alternated,  really strong versions of Mogo would score poorly against
> the weaker programs.
>
> In a serious competition you would want to throttle down the playing
> strength (when playing black) so that it could win more and not just
> quit (resign) out of frustration!


There was the suggestion of
something like tanh(K*score), instead of 0-1 scoring.

A similar system would make some sense, without loss of strength, if we
scale K like 1/n, where n is the number of simulations. A consequence
would be that programs would choose among 100% victory or defeat moves
according to the score they expect.

Let us analyze all this more precisely.

First, I consider that the "true" evaluation function E of the node is
the
exact probability of winning by using the MC simulations. In other
words,
that's the value we would get with infinitely many runs and 1-0 scoring,
and that's what I want to approximate as precisely as possible with my
finite number of runs.

Second I evaluate here only at one node, without growth of the tree.
Given the conclusions, there should not be much difference. If the
conclusion had been: we can do much better than 0-1, that would have
been quite another matter.

The 1-0 evalutation function has convergence in 1/\sqrt{n}, impossible
to beat if there is sufficient irregularity in the expectation of each
result.

Now, let us write p_i for the probability that the score of the MC
simulation  is i. So that i=0.5, -3.5, or 7.5 for example.
We have \sum p_i = 1, and our evaluation function is
E = \sum_i p_i sign(i)
(Well that is rather counting -1 for a defeat, but that's the same and
is more symmetrical, so nicer for presentations of the following
calculations).

We observe
\hat{p}_i = n_i / n
where n_i is the number of our simulations that give result i, on the n
simulations we have carried out.
0-1 evaluation corresponds to estimate E by
\hat{E} = \sum_i \hat{p}_i sign(i).

Now the p_i are heavily correlated (**), so that I am convinced that I
could
get better results than 0-1 for low number of simulations. For high
number of simulations, certainly not (the autocorrelation would
dominate).

Now these methods (probably using low-band filters on the array of
\hat{p}_i) would probably be much slower than running a few more
simulations...

If we keep to something simple, we could use functions of the form
\hat{E}_{\lambda} = \sum_i \hat{p}_i sign(i) (1 - \lambda_i).

This includes the suggestion of using tan(K i)...
Notice that 0-1 corresponds to taking lambda_i = 0.

Let us compute the expected mean square error (MSE)  = Bias^2 + Variance:

Bias :
B = - E + \sum_i p_i sign(i) (1 - \lambda_i)
  = - \sum_i p_i sign(i) \lambda_i

This is zero with the 0-1 evaluation function.

Variance :
V = (1/n) (\sum_i p_i (1 - \lambda(i))^2  - (E + B)^2)

We are only interested in the difference with the 0-1 case, so I do not
develop further the variance. For 0-1, the variance would be 1/n (1 -
E^2).

MSE(\lambda) - MSE(0-1) =
B^2 (1 - 1/n) - 2 E*B / n + (1/n) [ \sum_i p_i (\lambda(i)^2 - 2 \lambda_i)]

So that if we want this value to be negative, showing better results
than 0-1, we first need the bias to be of order 1/n.
The only way to achieve this with pre-chosen p_i is to take (\lambda_i _
1) of order 1/n.
The change would then be only cosmetic: this means changing at most a
fixed number of outcomes when n goes to infinity !

Then the first order (that is 1/n^2) can be rewritten:
B^2 + (2/n) (\sum_i p_i \lambda(i) [E sgn(i) - 1] )

If we restrict to the cases where \lambda(i) = \lambda(-i), then we see
that the positive contributions imply (products of) terms or the form:
(p_i - p_{-i}) \lambda(i)
whereas the negative contribution implies
(p_i + p_{-i})  \lambda(i)

As we want to be as negative as possible, we could get a small
improvement if \lambda_i is bigger when p_i is near p_{-i}.
A priori, this should be the case for small i: if there are high chances
that final score is -.5, there are high chances that it is 1.5. No link
between 20.5 and -19.5 a priori, on the other hand...

We have to keep \sum \lambda_i very low, though, otherwise the squared bias 
gets dominant.


******* CONCLUSION *******

If we restrict our attention   to scheme where we add a value to the
node after the simulation, WE CANNOT GET (much) BETTER THAN THE USUAL
SUMMING OF VICTORIES.

There might be ways to get better evaluations, but they are much more
complicated and I have no idea of how they would interact with the tree.

However, we can, without loss of  performance, change the 1 in something like 
tanh(n * integer_part(2 + |i|)/2), where i is ths score of the simulation. This 
would be almost equivalent to:
first choose 0-1. If no difference, look at scores...

**************

Well, complicated for not much. I think/hope my next post (I know which
one) will be more interesting.


Jonas

----- End forwarded message -----
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to