I'm doing some experiments with the ref-bot I implemented. It's
basically the reference implementation as defined by Don but
including the weighted AMAF formula used by Michael Williams.
I'm trying to answer the following question in general (which leads
to several more specific questions):
- Does improving the playouts for the ref-bot improve play in a
similar fashion as it would for a program using search, like UCT?
With this I have in mind only using the playout-improvements during
simulation, not for exploration, so that we have a better apples-to-
apples comparison.
In a UCT-search program I have used several methods to improve
playouts but I have yet to determine which part of the gains I'm
seeing for each improvement is due to better exploration and which
part due to better simulation. Instead of trying things just willy-
nilly I think it might be interesting to try for a bit more
scientific approach where we start with a hypothesis, we test and
then we accept or refute. (Or possibly we find it's inconclusive.)
So I'd be interested to hear opinions on a few 'playout improvements'
in the given context above and I'd like to hear some hypotheses about
why it would improve play (or not) and possibly an idea of how much
gain it would make by estimating say a winning percentage against the
original ref-bot. This before I complete the testing and publish the
results, so that we don't expose ourselves too much to confirmation-
bias. All the testing will be done with an equal number of playouts
(2,000), so performance is not taken into consideration. Here is the
list of 'playout improvements' I'm thinking of testing:
1- Capture a stone in atari with a certain probability (like David
Fotland says he's doing).
2- Forbid playing on the 1st or 2nd line unless there's a stone
within manhatten-distance 2.
3- Forbid putting yourself into atari with a large number (>6) of
stones.
4- Forbid putting yourself into atari in a dead--end that's not a
false eye (more than one diagonal occupied) and doesn't also reduce
liberties of a surrounding chain to 2 or less.
_O_
X.O
_O_ (underscore is don't-care)
5- Defend a stone put into atari by the last move, provided it's not
caught in a ladder.
6- Capture the last stone played in a ladder.
7- Play a straightforward cut near the last move.
_OX
X.O
8- Defend a straightforward cut near the last move (same pattern as
above, O to move).
Given the nature of a pure MC program, move-selection should never be
totally deterministic. Since this would be the case for the last four
cases, I'll play them with 50% chance. I'd also be interested to hear
if people think other probabilities are more suitable. I'm going to
use about 1,000 games for each test.
Mark
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/