On Fri, Nov 10, 2017 at 03:40:27PM +0100, Gian-Carlo Pascutto wrote: > On 10/11/2017 1:47, Petr Baudis wrote: > > > * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7. > > How many filters per layer?
256 like AlphaGo. > FWIW 7 layer resnet (14 + 2 layers) is still pretty huge - larger than > the initial AlphaGo. Given the amount of games you have, and the size of > the board, I would not be surprised if your neural net program is > "outbooking" the opponent by remembering the sequences rather than > learning more generic things. > > (But hey, outbooking is learning too!) I couldn't exclude this, yes. It would be interesting to try to use the same convolutions on a bigger board to see if they play shapes and can do basic tactics. > > * The neural network is updated after _every_ game, _twice_, on _all_ > > positions plus 64 randomly sampled positions from the entire history, > > this all done four times - on original position and the three > > symmetry flips (but I was too lazy to implement 90\deg rotation). > > The reasoning being to give a stronger and faster reinforcement with the > latest data? Yes. > > * Value function is trained with cross-entropy rather than MSE, > > no L2 regularization, and plain Adam rather than hand-tuned SGD (but > > the annealing is reset time by time due to manual restarts of the > > script from a checkpoint). > > I never really had good results with Adam and friends compared to SGD > (even momentum does not always help - but of course it's much faster > early on). It has worked great on all my neural models in other tasks - but this is actually my first neural model for Go. :) > > * No resign auto-threshold but it is important to play 25% games > > without resigning to escale local "optima". > > This makes sense because both sides will miscount in exactly the same way. Without this, producing value 1.0 for one color and 0.0 for the other is a super-strong attractor. > > * 1/Temperature is 2 for first three moves. > > * Initially I used 1000 "simulations" per move, but by mistake, last > > 1500 games when the network improved significantly (see below) were > > run with 2000 simulations per move. So that might matter. > > > > This has been running for two weeks, self-playing 8500 games. A week > > ago its moves already looked a bit natural but it was stuck in various > > local optima. Three days ago it has beaten GNUGo once across 20 games. > > Now five times across 20 games - so I'll let it self-play a little longer > > as it might surpass GNUGo quickly at this point? Also this late > > improvement coincides with the increased simulation number. > > The simulation number if one of the big black boxes in this setup, I > think. If the policy network does not have a strong opinion yet, it > seems that one has to make it sufficiently bigger than the amount of > legal moves. If not, first-play-urgency will cause every successor > position to be evaluated and there's no look ahead, which means MCTS > can't discover anything. I don't see how first-play-urgency comes into play. Initially it'll be typically noise but that still means growing the tree pretty asymmetrically. I saw uniform sampling only in some cases when number of simulations was << number of children. > So a few times 361 makes sense for 19x19, but don't ask me why 1600 and > not 1200 etc. My feeling now is that especially slightly later on raising the count really helps. I think the moment is when you stop seeing regular, large discrepancies between network predictions and scoring output in very late endgame. But it could be an illusion. > With only 50-ish moves to consider on 7x7, it's interesting that you see > a big improvement by making it (relatively) much larger than DeepMind did. > > But uh, you're not simply matching it against GNUGo with more > simulations are you? I mean it would be quite normal to win more when > searching deeper. All playtests should have been with 2000 simulations. -- Petr Baudis, Rossum Run before you walk! Fly before you crawl! Keep moving forward! If we fail, I'd rather fail really hugely. -- Moist von Lipwig _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go