Darren Cook wrote:?
> I've been toying with the idea of having a set of playout algorithms and?
> allowing black and white to choose different algorithms in that playout.?
> (The idea came from trying to think how I could apply genetic?
> algorithms to UCT playouts.)?
>?
> Here's how it would work. Assume you have 4 algorithms, A/B/C/D, some?
> aggressive, some defensive, etc. All with a random element. For the?
> first 16 playouts you try all combinations:?
> Black uses A, White uses A;?
> Black uses A, White uses B;?
> ...?
> Black uses D, White uses D;?
>

???? I've experimented?along these lines a fair amount. It's a fun co-evolution 
problem. Intransitivity plays a critical role.

???? To picture it, assume you start from an arbitrary board position and start 
by evolving against a pure random playout strategy. Hopefully, the GA will 
quickly learn a playout strategy that plays good moves more frequently. Now the 
GA begins to evolve against the new strategy. The new strategy is stronger, in 
a sense, but it's also more predictable, so the GA will find a counter that 
exploits its weakness. It might learn a good general rule (in the context of 
the particular board state), but it's usually easier to find a stupid gimmick. 
This counter strategy will tend to be more predictable still. So the natural 
trajectory leads to more predictable, more brittle solutions until it resembles 
a game of scissors, paper, stone.

???? You can try to control the level of determinism in the playouts 
externally. But beware: co-evolution is uncannily good at finding tricky ways 
to sidestep constraints.

???? When one stops the GA, none of the playout strategies, in the final 
population, are likely to be suitable for use in UCT. But some properties from 
the playouts, averaged over time, can hold good information. Or not: there are 
a lot of ways for the GA to get stuck and forget how to counter a stupid 
gambit. (A hall of champions helps.)

???? There has been some recent discussion on the list of "stronger" playout 
strategies that perform worse in the context of UCT. I think that sounds 
natural when you look at the problem from this perspective.

???? Another aspect of this approach (and I'm certain others have thought of it 
as well) is to use a GA to produce the same weights that all-as-first does.

???? I did these experiments before CrazyStone or Mogo hit the scene. I had a 
tree search-Monte Carlo hybrid program, but it was bad by any standards. Now 
that I've learned more, I may try the GA again some day.

- Dave Hillis


________________________________________________________________________
Check Out the new free AIM(R) Mail -- Unlimited storage and industry-leading 
spam and email virus protection.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to