You'll probably have to test more than one percentage on each type. It's possible (and likely, I think) that 50% could result in worse play while something like 20% results in better play. Also, I'd like to re-submit my idea of increasing that number as the playout progresses.


Mark Boon wrote:
I'm doing some experiments with the ref-bot I implemented. It's basically the reference implementation as defined by Don but including the weighted AMAF formula used by Michael Williams.

I'm trying to answer the following question in general (which leads to several more specific questions):

- Does improving the playouts for the ref-bot improve play in a similar fashion as it would for a program using search, like UCT? With this I have in mind only using the playout-improvements during simulation, not for exploration, so that we have a better apples-to-apples comparison.

In a UCT-search program I have used several methods to improve playouts but I have yet to determine which part of the gains I'm seeing for each improvement is due to better exploration and which part due to better simulation. Instead of trying things just willy-nilly I think it might be interesting to try for a bit more scientific approach where we start with a hypothesis, we test and then we accept or refute. (Or possibly we find it's inconclusive.)

So I'd be interested to hear opinions on a few 'playout improvements' in the given context above and I'd like to hear some hypotheses about why it would improve play (or not) and possibly an idea of how much gain it would make by estimating say a winning percentage against the original ref-bot. This before I complete the testing and publish the results, so that we don't expose ourselves too much to confirmation-bias. All the testing will be done with an equal number of playouts (2,000), so performance is not taken into consideration. Here is the list of 'playout improvements' I'm thinking of testing:

1- Capture a stone in atari with a certain probability (like David Fotland says he's doing). 2- Forbid playing on the 1st or 2nd line unless there's a stone within manhatten-distance 2.
3- Forbid putting yourself into atari with a large number (>6) of stones.

4- Forbid putting yourself into atari in a dead--end that's not a false eye (more than one diagonal occupied) and doesn't also reduce liberties of a surrounding chain to 2 or less.
_O_
X.O
_O_ (underscore is don't-care)

5- Defend a stone put into atari by the last move, provided it's not caught in a ladder.
6- Capture the last stone played in a ladder.
7- Play a straightforward cut near the last move.

_OX
X.O

8- Defend a straightforward cut near the last move (same pattern as above, O to move).

Given the nature of a pure MC program, move-selection should never be totally deterministic. Since this would be the case for the last four cases, I'll play them with 50% chance. I'd also be interested to hear if people think other probabilities are more suitable. I'm going to use about 1,000 games for each test.

    Mark



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to