I agree with your main point that the first batch of games will be totally random moves. I just wanted to make a small point that even for totally random play, the network should be able to learn something about mid-game positions as well. At move 100, a position with 50 white stones and 40 black stones is likely to be a win for white, even with completely random play from there, since white has captured 10 black stones.
2017-10-26 8:17 GMT-05:00 Gian-Carlo Pascutto <g...@sjeng.org>: > On 25-10-17 16:00, Petr Baudis wrote: > > That makes sense. I still hope that with a much more aggressive > > training schedule we could train a reasonable Go player, perhaps at > > the expense of worse scaling at very high elos... (At least I feel > > optimistic after discovering a stupid bug in my code.) > > By the way, a trivial observation: the initial network is random, so > there's no point in using it for playing the first batch of games. It > won't do anything useful until it has run a learning pass on a bunch of > "win/loss" scored games and it can at least tell who is the likely > winner in the final position (even if it mostly won't be able to make > territory at first). > > This suggests that bootstrapping probably wants 500k starting games with > just random moves. > > FWIW, it does not seem easy to get the value part of the network to > converge in the dual-res architecture, even when taking the appropriate > steps (1% weighting on error, strong regularizer). > > -- > GCP > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go