Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

Álvaro Begué Tue, 23 Feb 2016 03:44:56 -0800

I have experimented with a CNN that predicts ownership, but I found it to
be too weak to be useful. The main difference between what Google did and
what I did is in the dataset used for training: I had tens of thousands of
games (I did several different experiments) and I used all the positions
from each game (which is known to be problematic); they used 30M positions
from independent games. I expect you can learn a lot about ownership and
expected number of points from a dataset like that. Unfortunately,
generating such a dataset is infeasible with the resources most of us have.


Here's an idea: Google could make the dataset publicly available for
download, ideally with the final configurations of the board as well. There
is a tradition of making interesting datasets for machine learning
available, so I have some hope this may happen.

The one experiment I would like to make along the lines of your post is to
train a CNN to compute both the expected number of points and its standard
deviation. If you assume the distribution of scores is well approximated by
a normal distribution, maximizing winning probability can be achieved by
maximizing (expected score) / (standard deviation of the score). I wonder
if that results in stronger or more natural play than making a direct model
for winning probability, because you get to learn more about each position.

Álvaro.



On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka <
michael.marke...@gmail.com> wrote:

> Hello everyone,
>
> in the wake of AlphaGo using a DCNN to predict expected winrate of a
> move, I've been wondering whether one could train a DCNN for expected
> territory or points successfully enough to be of some use (leaving the
> issue of win by resignation for a more in-depth discussion). And,
> whether winrate and expected territory (or points) always run in
> parallel or whether there are diverging moments.
>
> Computer Go programs play what are considered slack or slow moves when
> ahead, sometimes being too conservative and giving away too much of
> their potential advantage. If expected points and expected winrate
> diverge, this could be a way to make the programs play in a more
> natural way, even if there were no strength increase to be gained.
> Then again there might be a parameter configuration that might yield
> some advantage and perhaps this configuration would need to be
> dynamic, favoring winrate the further the game progresses.
>
>
> As a general example for the idea, let's assume we have the following
> potential moves generated by our program:
>
> #1: Winrate 55%, +5 expected final points
> #2: Winrate 53%, +15 expected final points
>
> Is the move with higher winrate always better? Or would there be some
> benefit to choosing #2? Would this differ depending on how far along
> the game is?
>
> If we knew the winrate prediction to be perfect, then going by that
> alone would probably result in the best overall performance. But given
> some uncertainty there, expected value could be interesting.
>
>
> Any takers for some experiments?
>
>
> -Michael
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

Reply via email to