[computer-go] RefBot (thought-) experiments

Mark Boon Mon, 17 Nov 2008 07:18:30 -0800

I'm doing some experiments with the ref-bot I implemented. It'sbasically the reference implementation as defined by Don butincluding the weighted AMAF formula used by Michael Williams.

I'm trying to answer the following question in general (which leadsto several more specific questions):

- Does improving the playouts for the ref-bot improve play in asimilar fashion as it would for a program using search, like UCT?With this I have in mind only using the playout-improvements duringsimulation, not for exploration, so that we have a better apples-to-apples comparison.

In a UCT-search program I have used several methods to improveplayouts but I have yet to determine which part of the gains I'mseeing for each improvement is due to better exploration and whichpart due to better simulation. Instead of trying things just willy-nilly I think it might be interesting to try for a bit morescientific approach where we start with a hypothesis, we test andthen we accept or refute. (Or possibly we find it's inconclusive.)

So I'd be interested to hear opinions on a few 'playout improvements'in the given context above and I'd like to hear some hypotheses aboutwhy it would improve play (or not) and possibly an idea of how muchgain it would make by estimating say a winning percentage against theoriginal ref-bot. This before I complete the testing and publish theresults, so that we don't expose ourselves too much to confirmation-bias. All the testing will be done with an equal number of playouts(2,000), so performance is not taken into consideration. Here is thelist of 'playout improvements' I'm thinking of testing:

1- Capture a stone in atari with a certain probability (like DavidFotland says he's doing).2- Forbid playing on the 1st or 2nd line unless there's a stonewithin manhatten-distance 2.3- Forbid putting yourself into atari with a large number (>6) ofstones.

4- Forbid putting yourself into atari in a dead--end that's not afalse eye (more than one diagonal occupied) and doesn't also reduceliberties of a surrounding chain to 2 or less.

_O_
X.O
_O_ (underscore is don't-care)

5- Defend a stone put into atari by the last move, provided it's notcaught in a ladder.

6- Capture the last stone played in a ladder.
7- Play a straightforward cut near the last move.

_OX
X.O

8- Defend a straightforward cut near the last move (same pattern asabove, O to move).

Given the nature of a pure MC program, move-selection should never betotally deterministic. Since this would be the case for the last fourcases, I'll play them with 50% chance. I'd also be interested to hearif people think other probabilities are more suitable. I'm going touse about 1,000 games for each test.


        Mark



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] RefBot (thought-) experiments

Reply via email to