Weston Markham wrote:

> 1.  Uniform playouts, as used in practice, are not really uniform over
> all legal go moves.  Generally, pass moves are excluded until
> necessary, and moves that fill "eyelike" points are excluded.  So, I
> assume that when you use the word "legal", you mean admissible within
> this sort of playout.

That shouldn't make a difference. If you count the pass as a legal move
you should also count it as a legal moves in the subpaths.
You would have W (subpaths without pass) and W' (subpaths with pass).
Since it is only one of the 255 legal moves (counting the expected value
of # legal moves as 19^2 - 212/2 where 212 = length of a pro game) it
should not be very important even if miscounted.

> 2. "That variance depends on the length of the playout."  It is
> difficult for me to make sense of this statement, simply because not
> all playouts from a given position have the same length.

Of course, but it is reasonable to expect 9x9 playout around 120, 19x19
around 400, etc. That has, at least, two consequences:

 1. UCT is stronger for global search studying move 150 than move 5.

 2. UCT can also be used for local search on 50-80 empty cells, something
astronomically above what can be done with alfa-beta (assuming we want final
evaluation, not "evaluating" nodes at some intermediate state.) Of course,
UCT does not "talk territory" so adding the local subgames is far from trivial.

> "The variance of the stochastic process is not to be mixed up
> with the distribution of the error of a repeated Bernoulli experiment!"
> Perhaps I have mixed them up.  Can you explain more clearly or precisely
> what "the variance of the stochastic process" is?

Repeated Bernoulli experiments are studied as a stochastic process and,
in our case, _the experiment is a stochastic process itself_.

We have a nested process: A process whose steps are stochastic processes.

Therefore we have the variance of the Bernoulli process which is in the
binomial Bi(n, p) but that p is the result of a stochastic process whose
variance biases p towards 1/2.

I don't don't have a particular mathematical model for that process and model it as a noise. No matter how you distribute it, as long as the expected value is zero
it and has the same consequences. This may be clearer:

Count each move in the playout as adding a random territory difference
in {-3, -2, -1, 0, 1, 2, 3} to the actual territorial value with any
distribution whose expected value is 0. The probability of the result
being > 0 (a win) starting with a good move (initial value = +5)
after 10 moves is significantly bigger than starting with a bad move
(initial value = -3). But after a million moves the probability of
a win is the same for both = 1/2. Because the variance of the
experiment is somewhere in k·n the standard deviation is proportional
to the sqrt of n = 1000. With a std deviation around 1000 the initial
values 5 or -3 are way too small to make any difference.

> Does this game violate the condition that "the number of legal moves
> for each side is balanced"?

I mean my argument in the big numbers. Of course, if you take it down to
the detail you can find counterexamples. But these counterexamples
should be balanced for black and white. In fact, that's what I pretended
to say "if everything is the same for black and white" take "the same"
in an informal sense, there is no reason one of the sides should be
favored and p estimates W. What is more important is that it estimates
W biased towards 1/2 because the playout is a stochastic process.

> Even if we can compute W exactly, do we have any reason to think
> that its value is a good estimate of the minimax value of the game?

It is *not* a good estimate. I am not trying to advocate in favor of MC
as a panacea. In the beginning I was among the critics and have done
an effort to understand it better. Slowly I am becoming convinced
of the possibilities specially in combination with other methods
and now I have written a UCT engine, mainly for analysis.
W fails to represent the minimax value of the game at least in two
common situations: self-atari moves that would be a good idea if
the opponent was kind enough as to forgive the capture and ladders.
But _that is_ what uniformly random MC evaluates, not my fault ;-)

Jacques.

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to