A recurrent concept popping up in discussions on how to improve
playouts is "balance". So I would like to try to share my philosophy
behind the playouts of Valkyria and how I define "balance" and how it
relates to the evaluation of go positions.
*Background
In an old school program the evaluation function would try to see
which stones are safely connected to other stones of the same colours.
Connected stones are called groups, and the program would probably
also try to evaluate the safety of the groups looking at the eyespace
at hand, weak neighbouring groups and so on. This quickly gets very
commplicated. My old program Viking had several 1000's of handmade
patterns for evaluating connectivity alone. This worked as a dream as
long as the position consisted of patterns in the database... but in
each an every game there were new situations and new patterns had to
be added. A more robust method would be to use tactical search in the
program to evaluate connectivity. The problem there is to ensure
accuracy efficiently. Any tactical search tends to either become too
time consuming, or resort to guessing.
*MC-evaluation
Programs using uniformly random MC-eval favors very solid but
inefficient shape, often building blocks of adjascent stones in the
center of the board. The reason is that if stones are played more
loosely the stones often get cut off and get killed in the simulations.
What we rather want is a program that can play efficent moves where
stones are safely connected but occupy as much territory/eyespace as
possible.
The tree search (UCT) cannot alone solve this problem. Shapes created
in a 19x19 game may exist untouched to the late endgame and it is not
possible to read out all shapes on the board. It is much better if
secure shapes stay stable in the simulation.
They way I implemented that in Valkyria is that the playout part is
basically reactive to random moves that attacks shapes on the board.
It does not in any sense attempt to play the best move on the board.
If it does not need to defend a shape it plays uniformly random
somewhere. [Note that Valkyria also prunes really ugly moves, thus it
plays uniformly the first move that is not pruned]
This is also how the pattern system works in Mogo as I understand it.
If I remember correctly I would say that all Mogo patterns are very
basic and common sense defenses against attacks on otherwise stable
shapes.
But there also have to be balance. Valkyria also punishes bad shape.
That is if a weak shape already is on the board, or a random move
attacked two shapes simulatanously in the simulation, then the program
may attack the weakness (or in a way it also reacts to the situation
defending against "the weak shape becoming stronger"). Often the same
move that would have been the proper defence is played.
*Eliminating noise rather than predicting the best move
Nothing I wrote above is original or new to the readers of this list.
But I want to make a distinction between systems that tries to predict
the best move and a system that only tries to eliminate noise from the
otherwise very random simulations.
Noise is eliminated when strong stones live and weak stones die almost
always in the simulations. This way the random evaluations will mostly
react to moves that matter in urgent fighting with shapes that are not
yet stable. A MC-program that does this should stop defending and
attacking strong shapes and would require much less simulations to
discriminate between good and bad moves. Valkyria2 and Valkyria3 has
slightly different tree search algorithms but uses the same playouts.
Both versions needs only 512 playouts per move to win 50% against
Gnugo 3.7.10.
Still I think predicting the best moves is very important in the tree
part, but this may be much less important in the playouts, and perhaps
even detrimental as some people have experienced.
*19x19
The best programs on 19x19 seems to focus the uct search on local
fighting. This temporarilly overcomes the biases of the simulations
locally. But the information gained locally about good shape in the
tree is forgotten when the fighting moves elswhere. But this knowledge
can then be rediscovered later if the fighting comes back. Could a
future improvement to 19x19 go be to use locally narrow searches that
seeds the playouts with strong patterns for the current shapes on the
board? Maybe someone is already doing this? A really strong approach
would be to eliminate the need of hard coded patterns or offline
pattern harvesting and let the program learn during the game.
-Magnus
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/