Darren Cook wrote:? > I've been toying with the idea of having a set of playout algorithms and? > allowing black and white to choose different algorithms in that playout.? > (The idea came from trying to think how I could apply genetic? > algorithms to UCT playouts.)? >? > Here's how it would work. Assume you have 4 algorithms, A/B/C/D, some? > aggressive, some defensive, etc. All with a random element. For the? > first 16 playouts you try all combinations:? > Black uses A, White uses A;? > Black uses A, White uses B;? > ...? > Black uses D, White uses D;? >
???? I've experimented?along these lines a fair amount. It's a fun co-evolution problem. Intransitivity plays a critical role. ???? To picture it, assume you start from an arbitrary board position and start by evolving against a pure random playout strategy. Hopefully, the GA will quickly learn a playout strategy that plays good moves more frequently. Now the GA begins to evolve against the new strategy. The new strategy is stronger, in a sense, but it's also more predictable, so the GA will find a counter that exploits its weakness. It might learn a good general rule (in the context of the particular board state), but it's usually easier to find a stupid gimmick. This counter strategy will tend to be more predictable still. So the natural trajectory leads to more predictable, more brittle solutions until it resembles a game of scissors, paper, stone. ???? You can try to control the level of determinism in the playouts externally. But beware: co-evolution is uncannily good at finding tricky ways to sidestep constraints. ???? When one stops the GA, none of the playout strategies, in the final population, are likely to be suitable for use in UCT. But some properties from the playouts, averaged over time, can hold good information. Or not: there are a lot of ways for the GA to get stuck and forget how to counter a stupid gambit. (A hall of champions helps.) ???? There has been some recent discussion on the list of "stronger" playout strategies that perform worse in the context of UCT. I think that sounds natural when you look at the problem from this perspective. ???? Another aspect of this approach (and I'm certain others have thought of it as well) is to use a GA to produce the same weights that all-as-first does. ???? I did these experiments before CrazyStone or Mogo hit the scene. I had a tree search-Monte Carlo hybrid program, but it was bad by any standards. Now that I've learned more, I may try the GA again some day. - Dave Hillis ________________________________________________________________________ Check Out the new free AIM(R) Mail -- Unlimited storage and industry-leading spam and email virus protection.
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/