I have several reasons for suggesting some form of the "rich men don't pick fights, but they don't give away points either" philosophy.
The major one is that the MCTS scoring function is imperfect; historically, programs have snatched defeat from the jaws of victory by letting points be nibbled away in yose. Second, it is unsatisfying to play against a program which becomes indifferent in the yose stage. My reaction is "what, are you phoning in your moves now?" - this might be annoying but tolerable if the program actually had reason to be so sure of itself, but experience has shown that it does not; see above. Third, the "only wins matter" approach seems to discard a great deal of useful information. Terry McIntyre <[email protected]> Unix/Linux Systems Administration Taking time to do it right saves having to do it twice. ________________________________ From: Álvaro Begué <[email protected]> To: [email protected] Sent: Sun, July 3, 2011 10:50:50 PM Subject: Re: [Computer-go] MCTS and perfect endgame On Sun, Jul 3, 2011 at 10:14 PM, terry mcintyre <[email protected]> wrote: > From: Jean-loup Gailly <[email protected]> > To: [email protected] > Sent: Sun, July 3, 2011 9:12:59 AM > Subject: Re: [Computer-go] MCTS and perfect endgame > > Leon, >> One of problems (which I tested with gogui, thankyou very much) >> was losing points in endgame when program is winning. > This is by design. Pachi maximises the chance of winning, not the number > of points. But if you want Pachi to win by more points while increasing > the risk of losing, you can simply increase the parameter val_scale. See the > description in uct/uct.c: "How much of the game result value should be > influenced by win size. Zero means it isn't". The default value is 0.04, > which is the result of tuning. (If you increase val_scale above this it > starts > losing more.) > > Why should this value be static? Shouldn't the behavior change when there is > a certain win? It should be static for a reason that is perhaps more philosophical than practical. I view MCTS as a procedure to maximize the expected value of a utility function (e.i., how happy I am with the result), which is in some important sense the only rational way to make decisions. If the utility of any win is the same, it makes sense to simply maximize the probability of winning. If we are not happy with the program wasting points in a favorable endgame, it must be the case that we are happier with a win by a large margin than with a win by a small margin, so it makes sense to build that into the reward function, which is what val_scale does. Perhaps a sigmoid of some sort would be a better shape, but it should not be something that changes dynamically. Álvaro. _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
