It seems to me, the fundamental reason MC go (regardless of details)
works as it does is because it is the only search method (at least that
I am aware of) that has found a way to manage the evaluation problem.
Evaluation is not as problematic because MC goes to the bitter end
where the status is known with certainty.  With random distributions it
probably tends to find robust moves that leave a lot favorable options
open.  With MoGo, Sylvain has shown that better simulation policies can
achieve much better results.

But what are some of the reasons MC is not even better?
-Since MC engines don't deal with tactics directly, they're not likely
going to play tactical sequences well for low liberty strings, securing
eye space, cutting and connecting, ko fights, or ladders, etc.
-Also because most of the play-outs are usually nonsense, they may
have trouble dealing with meaningful nuances because the positions that
will lead to these distinctions just don't arise with enough statistical
frequency in the play-outs to affect the result.  Yet when very
selective moves are used in the play-outs, too many possibilities can be
missed.
-Finally, with 19x19 anyway, the size of the board and game tree
probably limits the practical effectiveness of the sampling and move
ordering. I don't try to address this last point any further in this
message.

So here is an idea for MC research:

Incorporate multiple types of distributions in one MC player.  Available
time resources would be divided between the different distribution
methods.  Then the results of these could be combined in some kind of
sum/rank/vote/etc. For UCT this could be used to direct the search at
those most interesting nodes.

As an example, distributions such as these could be used:
1.  A random or near random distribution
2.  A more selective pattern based distribution
3.  A simple tactical reader based distribution - this might not be
obvious how to implement, but perhaps it could play tactical sequences
if such conditions (based on heuristics) existed on the board, otherwise
switch to one of the others.

With regard to variance reduction techniques, #2 and #3 might be
examples of importance sampling and conditional sampling.  And the above
overall method might fall under the category of "stratified sampling".

Thoughts?


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to