On Wed, Feb 07, 2007 at 12:06:40PM +0200, Tapani Raiko wrote: > Let my try again using the handicap example. Let's say MC player is given > a huge handicap. In the simulations, it is winning all of its games, so > there is no information helping to select the next move.
This situation happens in normal games too, once one player is so much ahead that it wins almost no matter what. It leads into really stupid-looking endgames, where live groups are allowed to die, and dead ones are allowed to be rescued. All this could be avoided by a simple rule: Instead of using +1 and -1 as the results, use +1000 and -1000, and add the final score to this. The purpose of the large constant (1000) is to make sure that it prefers any win to any loss (so that large_win + small_loss < small_win + small_win). One could even add another term in the result, favouring games that end early (for the winner) or postpone them (for the looser), in hope of allowing the opponent more chances to make mistakes. As far as I can see, this ought to fit straight in to any MC or UCT program. It may not improve the winning chances, but it sure should make the programs play look more reasonable. Just my humble idea. Feel free to shoot down (with serious arguments), and/or use where ever you like. I would like to hear if this makes any practical difference, if anyone tries. - Heikki -- Heikki Levanto "In Murphy We Turst" heikki (at) lsd (dot) dk _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/