You'll probably have to test more than one percentage on each type. It's possible (and likely, I think) that 50% could result in worse play while something
like 20% results in better play. Also, I'd like to re-submit my idea of increasing that number as the playout progresses.
Mark Boon wrote:
I'm doing some experiments with the ref-bot I implemented. It's
basically the reference implementation as defined by Don but including
the weighted AMAF formula used by Michael Williams.
I'm trying to answer the following question in general (which leads to
several more specific questions):
- Does improving the playouts for the ref-bot improve play in a similar
fashion as it would for a program using search, like UCT? With this I
have in mind only using the playout-improvements during simulation, not
for exploration, so that we have a better apples-to-apples comparison.
In a UCT-search program I have used several methods to improve playouts
but I have yet to determine which part of the gains I'm seeing for each
improvement is due to better exploration and which part due to better
simulation. Instead of trying things just willy-nilly I think it might
be interesting to try for a bit more scientific approach where we start
with a hypothesis, we test and then we accept or refute. (Or possibly we
find it's inconclusive.)
So I'd be interested to hear opinions on a few 'playout improvements' in
the given context above and I'd like to hear some hypotheses about why
it would improve play (or not) and possibly an idea of how much gain it
would make by estimating say a winning percentage against the original
ref-bot. This before I complete the testing and publish the results, so
that we don't expose ourselves too much to confirmation-bias. All the
testing will be done with an equal number of playouts (2,000), so
performance is not taken into consideration. Here is the list of
'playout improvements' I'm thinking of testing:
1- Capture a stone in atari with a certain probability (like David
Fotland says he's doing).
2- Forbid playing on the 1st or 2nd line unless there's a stone within
manhatten-distance 2.
3- Forbid putting yourself into atari with a large number (>6) of stones.
4- Forbid putting yourself into atari in a dead--end that's not a false
eye (more than one diagonal occupied) and doesn't also reduce liberties
of a surrounding chain to 2 or less.
_O_
X.O
_O_ (underscore is don't-care)
5- Defend a stone put into atari by the last move, provided it's not
caught in a ladder.
6- Capture the last stone played in a ladder.
7- Play a straightforward cut near the last move.
_OX
X.O
8- Defend a straightforward cut near the last move (same pattern as
above, O to move).
Given the nature of a pure MC program, move-selection should never be
totally deterministic. Since this would be the case for the last four
cases, I'll play them with 50% chance. I'd also be interested to hear if
people think other probabilities are more suitable. I'm going to use
about 1,000 games for each test.
Mark
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/