On Sun, Jul 3, 2011 at 11:17 PM, terry mcintyre <[email protected]>wrote:
> I have several reasons for suggesting some form of the "rich men don't pick > fights, but they don't give away points either" philosophy. > > The major one is that the MCTS scoring function is imperfect; historically, > programs have snatched defeat from the jaws of victory by letting points be > nibbled away in yose. > I don't think this anecdotal evidence means much except that computers do whatever they do imperfectly. If you build a point counter it plays much worse, and you will find MORE examples of how playing greedy costs it game after game. I think instead of presenting your intuition on this you need to propose a solution and explain why winning the game is not the most appropriate goal. I think there have been basically 2 solutions in general use, neither of which is provably any better than the best programs which do not use these solutions: 1. Make use of a more classical move generator to impose some sanity to otherwise random moves. 2. Use one of the various pseudo komi schemes to "trick" the program into setting more ambitious goals. Don Second, it is unsatisfying to play against a program which becomes > indifferent in the yose stage. My reaction is "what, are you phoning in your > moves now?" - this might be annoying but tolerable if the program actually > had reason to be so sure of itself, but experience has shown that it does > not; see above. > I don't think that is true. Seeing bad moves does not prove a cause - do you have something more scientific? I am SURE you can find examples of anything you want but that doesn't mean it backs up your intuition. > Third, the "only wins matter" approach seems to discard a great deal of > useful information. > I think the issue here is that the "only wins matter" is 95% of the useful information. I think experience DOES back this up. A program that only counts points is convincingly dominated by the "only wins matter" so I think it's a stretch to say, "great deal of useful information." There should be a way to use this small amount of additional information but I don't know what it is. Do you? People argue this but never propose anything. So propose something for people to try! > > Terry McIntyre <[email protected]> > > Unix/Linux Systems Administration > Taking time to do it right saves having to do it twice. > > > ------------------------------ > *From:* Álvaro Begué <[email protected]> > > *To:* [email protected] > *Sent:* Sun, July 3, 2011 10:50:50 PM > > *Subject:* Re: [Computer-go] MCTS and perfect endgame > > On Sun, Jul 3, 2011 at 10:14 PM, terry mcintyre <[email protected]> > wrote: > > From: Jean-loup Gailly <[email protected]> > > To: [email protected] > > Sent: Sun, July 3, 2011 9:12:59 AM > > Subject: Re: [Computer-go] MCTS and perfect endgame > > > > Leon, > >> One of problems (which I tested with gogui, thankyou very much) > >> was losing points in endgame when program is winning. > > This is by design. Pachi maximises the chance of winning, not the number > > of points. But if you want Pachi to win by more points while increasing > > the risk of losing, you can simply increase the parameter val_scale. See > the > > description in uct/uct.c: "How much of the game result value should be > > influenced by win size. Zero means it isn't". The default value is 0.04, > > which is the result of tuning. (If you increase val_scale above this it > > starts > > losing more.) > > > > Why should this value be static? Shouldn't the behavior change when there > is > > a certain win? > > It should be static for a reason that is perhaps more philosophical > than practical. I view MCTS as a procedure to maximize the expected > value of a utility function (e.i., how happy I am with the result), > which is in some important sense the only rational way to make > decisions. If the utility of any win is the same, it makes sense to > simply maximize the probability of winning. If we are not happy with > the program wasting points in a favorable endgame, it must be the case > that we are happier with a win by a large margin than with a win by a > small margin, so it makes sense to build that into the reward > function, which is what val_scale does. Perhaps a sigmoid of some sort > would be a better shape, but it should not be something that changes > dynamically. > > Álvaro. > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
