After getting some strange results with my experiments I realised that doing these tests on the reference-bot is probably not appropriate. The MC+AMAF combination is a completely different beast than MC+UCT-search.

So I need to do the testing on a reference implementation for MC+UCT- search. People have mentioned here before they'd like to see something like that. I have something like a reference implementation that I use for my own experiments. But I'm hardly an expert on either MC or UCT so I'm a bit hesitant to propose a reference implementation. But maybe I can make a start. At least we do have a good standard definition of a reference MC algorithm. So I'm not going to repeat that here apart from saying it plays uniformly random simulations of moves that are legal and don't fill in an eye.

That leaves the UCT part, so let me take a stab at it:

- Let's say we're going to limit the search to N Monte-Carlo simulations. - For each simulation we need to determine which node in the tree to expand. This involves a recursive definition starting from the root- node. If a node is incomplete it is expanded further with the next move, the selection of which is done randomly by the MC algorithm. (A node is 'complete' when it has all the moves that the MC algorithm can generate as its children. I define this the case when the MC algorithm produces a pass-move. See below.) - If a node is 'complete', it tries to (recursively) expand the child with the best UCT value. This value is a combination of the win-loss ratio recorded in that node plus the UCT value. So the values used for comparison are based on the following formula:

wins / visits + exploration-factor * sqrt( 2 * (log(parent-visits) / (10* visits))

- I'm actually using an exploration-factor of 1.0 at the moment and have no idea what is the best or standard value for light playouts. - When the node to expand is selected, a random move is chosen for which it creates a new node. Next, provided the selected move is not a pass, a MC playout is performed. The win-visits ratio of the node just created is set to 1/1 or 0/1 depending on the playout result. It also updates the win-visits ratio of its parent(s) by the same amount until it reaches the root-node of the tree. - When the randomly selected move is a pass it doesn't expand the tree unless the game is nearly finished. The node is set to 'complete'. The game is considered 'nearly finished' when all empty points have more than 2 occupied neighbors. In that case it counts the score and sees if the side that passes would win the game. If it does win it updates the wins-visits ratio, otherwise it doesn't do anything. - When it reaches N simulations, the child of the root-node with the best wins-visits ratio is played. I've also seen that simpy the child with the highest number of visits is selected. Is there a difference? - Towards the end of the game, it may not be possible to do N expansions of the tree. To prevent an endless loop, I record how many times it failed to expand the tree and exit the loop when it exceeds a certain number (currently set to 1000).


The treatment of pass feels a bit convoluted. The idea behind it is that I don't want to allow pass-moves in the tree early in the game. But towards the very end of the game I want to allow pass to avoid being forced to make a bad move or not being able to make a move at all.

I'd also like to hear opinions on what would be a good N for a reference bot. If I set N to 2,000, just like the reference-bots on CGOS, it plays rather poorly. Much worse than the ref-bot. I haven't tested it a lot yet, but I think the break-even point against the MC- AMAF ref bot would be somewhere around N=10,000.

Any opinons, ideas or criticism are welcome.

        Mark

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to