Working through these ideas about wins, score, perfect play, etc... it is clear that maximizing wins is the correct basic strategy. However, I still feel that incorporating the score *somehow* should improve the winning estimates and overall strength of MCTS.
See below for more thoughts. >From: Don Dailey <[email protected]> > >> 1) According to the rules of Go, the winner is the player with the highest >> score, but a win is equivalent to any other win--winning by 0.5 points is >> enough. So perfect play would maximize wins but not necessarily points. > >I think you are right. In fact you say not necessarily but I say, "definitely >not", you won't maximize points by playing to win. Let me put that another way--for any given game position, perfect play includes 0 to m moves which all lead to a win. These wins would fall in the range from 0.5 (the lowest possible point total for a win) to n (where n is the maximum numbers of points possible for a given boardsize). If the strategy of the perfect player is only to win, then the winning score will be distributed (randomly? bell curve?) from 0.5 to n. If the strategy of the perfect player is to maximize points, then the winning scores will tend to be closer to n, but no higher than the maximum points possible from that given game position. I suppose both distributions would also be altered by the ability of the opponent (weaker opponent = higher scores, stronger opponent = lower scores). >>With a perfect evaluation function, the "play to maximize points" strategy >>should also lead to perfect play. > >Another way to see this is that if you win maximally (in the point sense) you >also win. So winning the most points is a more difficult goal and a superset >of just winning. I think you mean maximizing points is a _subset_ of just winning. This makes sense. Of all the winning plays, only a few would lead to maximum points. Human intuition tells us that playing aggressively (maximizing points) is risky (low probability) and is only successful against a far weaker opponent. >Playing to win is the only strategy, the only issue at question is how to >improve our estimate of winning chances and it's certainly possible that >figuring out how to factor in >other things (such as consolation or "yose") >could improve our estimate. Playing to win is certainly the best strategy. I guess the question is: with MCTS, to evaluate the winning chance of a move, do you use winrate of playouts, the scores of playouts, some combination of the two, or perhaps some other information? >Playing to maximize wins is never the wrong strategy... >...counting points is misguided, it does not improve on the estimate but >something else might. >The point count by itself just doesn't tell you if you are being smart or >stupid. Choosing the move with the highest score is misguided (maximizing points) but by the same token, all (estimated) "winning" moves are not equivalent--for programs to get smarter we need better ways to distinguish between probabilities (risk) of winning moves. Programs that currently treat all wins as equivalent are losing some close games they might otherwise win with better risk understanding. (And perhaps opponent modeling would help.) The concept of dynamic komi involves adjusting the score with an offset to differentiate otherwise equivalent winning moves. This is one way of combining the "maximize wins" strategy and score information. The problem I see is that some "higher score" moves are also "higher risk" moves and lead to more volatile positions--and more losses. For handicap games (where the opponent is weaker) this may work okay, but there must be a better way to make use of scores against stronger opponents. Thanks to everyone for taking the time to explain their ideas. I really appreciate the in-depth and open dialogue on this list. I hope this discussion may have clarified some things for others (or even sparked and idea for further research). Ben. _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
