After getting some strange results with my experiments I realised
that doing these tests on the reference-bot is probably not
appropriate. The MC+AMAF combination is a completely different beast
than MC+UCT-search.
So I need to do the testing on a reference implementation for MC+UCT-
search. People have mentioned here before they'd like to see
something like that. I have something like a reference implementation
that I use for my own experiments. But I'm hardly an expert on either
MC or UCT so I'm a bit hesitant to propose a reference
implementation. But maybe I can make a start. At least we do have a
good standard definition of a reference MC algorithm. So I'm not
going to repeat that here apart from saying it plays uniformly random
simulations of moves that are legal and don't fill in an eye.
That leaves the UCT part, so let me take a stab at it:
- Let's say we're going to limit the search to N Monte-Carlo
simulations.
- For each simulation we need to determine which node in the tree to
expand. This involves a recursive definition starting from the root-
node. If a node is incomplete it is expanded further with the next
move, the selection of which is done randomly by the MC algorithm. (A
node is 'complete' when it has all the moves that the MC algorithm
can generate as its children. I define this the case when the MC
algorithm produces a pass-move. See below.)
- If a node is 'complete', it tries to (recursively) expand the child
with the best UCT value. This value is a combination of the win-loss
ratio recorded in that node plus the UCT value. So the values used
for comparison are based on the following formula:
wins / visits + exploration-factor * sqrt( 2 * (log(parent-visits) /
(10* visits))
- I'm actually using an exploration-factor of 1.0 at the moment and
have no idea what is the best or standard value for light playouts.
- When the node to expand is selected, a random move is chosen for
which it creates a new node. Next, provided the selected move is not
a pass, a MC playout is performed. The win-visits ratio of the node
just created is set to 1/1 or 0/1 depending on the playout result. It
also updates the win-visits ratio of its parent(s) by the same amount
until it reaches the root-node of the tree.
- When the randomly selected move is a pass it doesn't expand the
tree unless the game is nearly finished. The node is set to
'complete'. The game is considered 'nearly finished' when all empty
points have more than 2 occupied neighbors. In that case it counts
the score and sees if the side that passes would win the game. If it
does win it updates the wins-visits ratio, otherwise it doesn't do
anything.
- When it reaches N simulations, the child of the root-node with the
best wins-visits ratio is played. I've also seen that simpy the child
with the highest number of visits is selected. Is there a difference?
- Towards the end of the game, it may not be possible to do N
expansions of the tree. To prevent an endless loop, I record how many
times it failed to expand the tree and exit the loop when it exceeds
a certain number (currently set to 1000).
The treatment of pass feels a bit convoluted. The idea behind it is
that I don't want to allow pass-moves in the tree early in the game.
But towards the very end of the game I want to allow pass to avoid
being forced to make a bad move or not being able to make a move at all.
I'd also like to hear opinions on what would be a good N for a
reference bot. If I set N to 2,000, just like the reference-bots on
CGOS, it plays rather poorly. Much worse than the ref-bot. I haven't
tested it a lot yet, but I think the break-even point against the MC-
AMAF ref bot would be somewhere around N=10,000.
Any opinons, ideas or criticism are welcome.
Mark
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/