On Fri, Mar 8, 2019 at 8:19 AM Darren Cook <dar...@dcook.org> wrote: > > Blog post: > > https://blog.janestreet.com/accelerating-self-play-learning-in-go/ > > Paper: https://arxiv.org/abs/1902.10565 > > I read the paper, and really enjoyed it: lots of different ideas being > tried. I was especially satisfied to see figure 12 and the big > difference giving some go features made. > > Thanks, glad you enjoyed the paper.
> Though it would be good to see figure 8 shown in terms of wall clock > time, on equivalent hardware. How much extra computation do all the > extra ideas add? (Maybe it is in the paper, and I missed it?) > > I suspect Leela Zero would come off as far *less* favorable if one tried to do such a comparison using their actual existing code rather than abstracting down to counting neural net evals, because as far as I know in Leela Zero there is no cross-game batching of neural net evaluations, which makes a huge difference in the ability to use a strong GPU efficiently. Only in the last couple months or so based on what I've been seeing in chat and pull requests, Leela Zero implemented within-search batching of neural net evals, but clients still only play one game at a time. But maybe this is a distraction from your actual question, which is how much do these extra things slow the process down in computational time given both equivalent hardware *and* equivalently good architecture. Mostly they almost don't have any cost, which is why the paper doesn't really focus on that question. The thing to keep in mind is that almost all of the cost is the GPU-bound evaluation of the convolutional layers in the neural net during self-play. * Ownership and score distribution are not used during selfplay (except optionally ownership at the root node for a minor heuristic), so they don't contribute to selfplay cost. Also even on the training side, they are only a slight cost (at most a few percent), since this is just some computations at the output head of the neural net needing vastly fewer floating point ops than are in the convolutions in the main trunk of the net. * Global pooling costs nothing, since it does not add new convolutions in my implementation, instead only re-purposing existing channels. In my implementation, it actually *reduces* the number of parameters in the model and (I believe) the nominal number of floating point ops in the model, since the re-purposing of some channels to be pooled reduces the number of channels feeding into the convolution of the next layer. This is offset by the cost of doing the pooling and the additional GPU calls, netting out to about 0 cost. * Multi-board-size masking cost is also very small if your GPU implementation fuses the mask with adjacent batch-norm bias+scale operations. * Go-specific input features add about a 10% cost on my hardware when using the 10b128c net, due to a combination of ladder search being not cheap, and the extra IO that you have to do to the GPU to communicate the features, and presumably closer to 5% for 15b192c and continuing to decrease if you move to larger nets, as the CPU and IO cost become more and more irrelevant as the proportion of GPU work grows. * Playout/visit cap oscillation is just a change of some root-level search parameters. Target pruning is just some cheap CPU postprocessing. The cost of writing down the various additional targets somewhat expands the size of the training data on disk, but is pretty cheap with a good SSD. I think nothing else adds any cost. > I found some other interesting results, too - for example contrary to > > intuition built up from earlier-generation MCTS programs in Go, > > putting significant weight on score maximization rather than only > > win/loss seems to help. > > Score maximization in self-play means it is encouraged to play more > aggressively/dangerously, by creating life/death problems on the board. > A player of similar strength doesn't know how to exploit the weaknesses > left behind. (One of the asymmetries of go?) > Note that testing games were between random pairs of players proportional to p(1-p) from estimated Elos. Even for a 200 Elo difference, p = 0.76, then p(1-p) = 0.18, which is not that much smaller than when p = 0.5 giving p(1-p) = 0.25. So quite many testing games were between players of fairly different strengths.
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go