Re: [Computer-go] AlphaGo Zero

Andy Thu, 26 Oct 2017 14:21:12 -0700

I agree with your main point that the first batch of games will be totally
random moves. I just wanted to make a small point that even for totally
random play, the network should be able to learn something about mid-game
positions as well. At move 100, a position with 50 white stones and 40
black stones is likely to be a win for white, even with completely random
play from there, since white has captured 10 black stones.



2017-10-26 8:17 GMT-05:00 Gian-Carlo Pascutto <g...@sjeng.org>:

> On 25-10-17 16:00, Petr Baudis wrote:
> > That makes sense.  I still hope that with a much more aggressive
> > training schedule we could train a reasonable Go player, perhaps at
> > the expense of worse scaling at very high elos...  (At least I feel
> > optimistic after discovering a stupid bug in my code.)
>
> By the way, a trivial observation: the initial network is random, so
> there's no point in using it for playing the first batch of games. It
> won't do anything useful until it has run a learning pass on a bunch of
> "win/loss" scored games and it can at least tell who is the likely
> winner in the final position (even if it mostly won't be able to make
> territory at first).
>
> This suggests that bootstrapping probably wants 500k starting games with
> just random moves.
>
> FWIW, it does not seem easy to get the value part of the network to
> converge in the dual-res architecture, even when taking the appropriate
> steps (1% weighting on error, strong regularizer).
>
> --
> GCP
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

Reply via email to