I think it uses the champion network. That is, the training periodically generates a candidate, and there is a playoff against the current champion. If the candidate wins by more than 55% then a new champion is declared.
Keeping a champion is an important mechanism, I believe. That creates the competitive coevolution dynamic, where the network is evolving to learn how to beat the best, and not just most recent. Without that dynamic, the training process can go up and down. From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of uurtamo . Sent: Wednesday, October 25, 2017 6:07 PM To: computer-go <computer-go@computer-go.org> Subject: Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero)) Does the self-play step use the most recent network for each move? On Oct 25, 2017 2:23 PM, "Gian-Carlo Pascutto" <g...@sjeng.org <mailto:g...@sjeng.org> > wrote: On 25-10-17 17:57, Xavier Combelle wrote: > Is there some way to distribute learning of a neural network ? Learning as in training the DCNN, not really unless there are high bandwidth links between the machines (AFAIK - unless the state of the art changed?). Learning as in generating self-play games: yes. Especially if you update the network only every 25 000 games. My understanding is that this task is much more bottlenecked on game generation than on DCNN training, until you get quite a bit of machines that generate games. -- GCP _______________________________________________ Computer-go mailing list Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go