At the end of a playout there is probably some code that says samoething like reward = (score > komi) ? 1.0 : 0.0;
You can just replace it with reward = 1 / (1 + exp(- K * (score - komi))); A huge value of K will reproduce the old behaviour, a tiny value will result in a program that tries to maximize expected score, and values in the middle will blend both things nicely. Of course you would precompute this in a table. This seems elegant and simple to me. Now we only need to know how it affects performance. I bet there are values of K that would make everyone happy (no measurable loss in strength, still play good-looking moves even if the game is decided). Álvaro. On Dec 13, 2007 3:42 PM, Chris Fant <[EMAIL PROTECTED]> wrote: > On Dec 13, 2007 3:33 PM, Chris Fant <[EMAIL PROTECTED]> wrote: > > Seems like the final solution to this would need to build out the > > search tree to the end of the game, finding a winning line. And then > > search again with a different evaluation function (one based on > > points). If the second search cannot find a line that wins bigger > > than the first search did, just play the move returned by the first > > search. And you could get more clever be allowing the second search > > to start with some information from the first search. Note that when > > I say "winning line", I mean all the way to the end. No MC here. > > > > > Actually, I suppose it need not be to the absolute end of the game. > As long as all MC sims that finish out the game prior to scoring lead > to a win, then you can consider the tree portion a guaranteed winning > line and try the second search to maximize points. > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ >
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/