I have experimented with a CNN that predicts ownership, but I found it to be too weak to be useful. The main difference between what Google did and what I did is in the dataset used for training: I had tens of thousands of games (I did several different experiments) and I used all the positions from each game (which is known to be problematic); they used 30M positions from independent games. I expect you can learn a lot about ownership and expected number of points from a dataset like that. Unfortunately, generating such a dataset is infeasible with the resources most of us have.
Here's an idea: Google could make the dataset publicly available for download, ideally with the final configurations of the board as well. There is a tradition of making interesting datasets for machine learning available, so I have some hope this may happen. The one experiment I would like to make along the lines of your post is to train a CNN to compute both the expected number of points and its standard deviation. If you assume the distribution of scores is well approximated by a normal distribution, maximizing winning probability can be achieved by maximizing (expected score) / (standard deviation of the score). I wonder if that results in stronger or more natural play than making a direct model for winning probability, because you get to learn more about each position. Álvaro. On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka < michael.marke...@gmail.com> wrote: > Hello everyone, > > in the wake of AlphaGo using a DCNN to predict expected winrate of a > move, I've been wondering whether one could train a DCNN for expected > territory or points successfully enough to be of some use (leaving the > issue of win by resignation for a more in-depth discussion). And, > whether winrate and expected territory (or points) always run in > parallel or whether there are diverging moments. > > Computer Go programs play what are considered slack or slow moves when > ahead, sometimes being too conservative and giving away too much of > their potential advantage. If expected points and expected winrate > diverge, this could be a way to make the programs play in a more > natural way, even if there were no strength increase to be gained. > Then again there might be a parameter configuration that might yield > some advantage and perhaps this configuration would need to be > dynamic, favoring winrate the further the game progresses. > > > As a general example for the idea, let's assume we have the following > potential moves generated by our program: > > #1: Winrate 55%, +5 expected final points > #2: Winrate 53%, +15 expected final points > > Is the move with higher winrate always better? Or would there be some > benefit to choosing #2? Would this differ depending on how far along > the game is? > > If we knew the winrate prediction to be perfect, then going by that > alone would probably result in the best overall performance. But given > some uncertainty there, expected value could be interesting. > > > Any takers for some experiments? > > > -Michael > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go