Hello everyone, in the wake of AlphaGo using a DCNN to predict expected winrate of a move, I've been wondering whether one could train a DCNN for expected territory or points successfully enough to be of some use (leaving the issue of win by resignation for a more in-depth discussion). And, whether winrate and expected territory (or points) always run in parallel or whether there are diverging moments.
Computer Go programs play what are considered slack or slow moves when ahead, sometimes being too conservative and giving away too much of their potential advantage. If expected points and expected winrate diverge, this could be a way to make the programs play in a more natural way, even if there were no strength increase to be gained. Then again there might be a parameter configuration that might yield some advantage and perhaps this configuration would need to be dynamic, favoring winrate the further the game progresses. As a general example for the idea, let's assume we have the following potential moves generated by our program: #1: Winrate 55%, +5 expected final points #2: Winrate 53%, +15 expected final points Is the move with higher winrate always better? Or would there be some benefit to choosing #2? Would this differ depending on how far along the game is? If we knew the winrate prediction to be perfect, then going by that alone would probably result in the best overall performance. But given some uncertainty there, expected value could be interesting. Any takers for some experiments? -Michael _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go