Re: [computer-go] RefBot (thought-) experiments

Michael Williams Mon, 17 Nov 2008 07:38:59 -0800

You'll probably have to test more than one percentage on each type. It's possible (and likely, I think) that 50% could result in worse play while somethinglike 20% results in better play. Also, I'd like to re-submit my idea of increasing that number as the playout progresses.



Mark Boon wrote:

I'm doing some experiments with the ref-bot I implemented. It'sbasically the reference implementation as defined by Don but includingthe weighted AMAF formula used by Michael Williams.
I'm trying to answer the following question in general (which leads toseveral more specific questions):
- Does improving the playouts for the ref-bot improve play in a similarfashion as it would for a program using search, like UCT? With this Ihave in mind only using the playout-improvements during simulation, notfor exploration, so that we have a better apples-to-apples comparison.
In a UCT-search program I have used several methods to improve playoutsbut I have yet to determine which part of the gains I'm seeing for eachimprovement is due to better exploration and which part due to bettersimulation. Instead of trying things just willy-nilly I think it mightbe interesting to try for a bit more scientific approach where we startwith a hypothesis, we test and then we accept or refute. (Or possibly wefind it's inconclusive.)
So I'd be interested to hear opinions on a few 'playout improvements' inthe given context above and I'd like to hear some hypotheses about whyit would improve play (or not) and possibly an idea of how much gain itwould make by estimating say a winning percentage against the originalref-bot. This before I complete the testing and publish the results, sothat we don't expose ourselves too much to confirmation-bias. All thetesting will be done with an equal number of playouts (2,000), soperformance is not taken into consideration. Here is the list of'playout improvements' I'm thinking of testing:
1- Capture a stone in atari with a certain probability (like DavidFotland says he's doing).2- Forbid playing on the 1st or 2nd line unless there's a stone withinmanhatten-distance 2.
3- Forbid putting yourself into atari with a large number (>6) of stones.
4- Forbid putting yourself into atari in a dead--end that's not a falseeye (more than one diagonal occupied) and doesn't also reduce libertiesof a surrounding chain to 2 or less.
_O_
X.O
_O_ (underscore is don't-care)
5- Defend a stone put into atari by the last move, provided it's notcaught in a ladder.
6- Capture the last stone played in a ladder.
7- Play a straightforward cut near the last move.

_OX
X.O
8- Defend a straightforward cut near the last move (same pattern asabove, O to move).
Given the nature of a pure MC program, move-selection should never betotally deterministic. Since this would be the case for the last fourcases, I'll play them with 50% chance. I'd also be interested to hear ifpeople think other probabilities are more suitable. I'm going to useabout 1,000 games for each test.
    Mark



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] RefBot (thought-) experiments

Reply via email to