And guess for posterity... I am using what I believe to be a correct implementation of the SL from AG... with 12 layers, 128 filters instead of 192 for inner layers, 46 features (2 ladder features missing) instead of 48, all 8 symmetries on the input data. Adam runs had to decrease LR to .0001... but think that makes sense... Adam paper had batchsizes of 128 and LR of .001. I was using 16 in a minibatch at the time so LR generally needs to decrease because less accurate gradients. For example... 256 MB with LR of .05 in DarkForest is somewhat proportional to 16 in minibatch and .003 LR. In the end been staying mainly with vanilla SGD because all of the papers seem to and the tracks seemed pretty bouncy to me.
My guess is if you run a 51% accuracy GoGoD network or perhaps even a 56% KGS network... and dont worry too much about features or filter counts... you could get 3D on KGS vs. humans. With just a network evaluation... that is pretty dang impressive. I guess there might be weaknesses that humans could figure vs. just the networks without search... but still... I'd be pretty happy. On Wed, Aug 24, 2016 at 12:30 AM, Robert Waite <winstonwa...@gmail.com> wrote: > @Detlef It is comforting to hear that GoGoD data seemed to converge > towards 51% in your testing. When I ran KGS data... it definitely converged > more quickly but I stopped them short. I think it all makes sense if figure > 5 of the DarkForest paper is the convergence of KGS data... and it doesn't > seem clear... but looking at the paper now... they are comparing with > Maddison and makes sense they would show the numbers for the same dataset. > > @GCP the three move strength graphs looked shaky to me... it doesnt seem > like a clear change in strength. For the ladder issue... I think MCTS and a > value or fast rollout network are how AG overcame weaknesses like that. The > fast rollout network is actually the vaguest part to me... i have red some > of the ancestor papers... and can see that people in the field know what > they are describing mostly.. but I don't know where to begin to get the > pattern counts listed in the AG tables at the end of the paper > > @David Have you matched your network vs. GnuGo? I think accuracy and loss > are indicators of model health... but playing strength seems diff. The AG > paper only mentions beating Pachi at 100k rollouts with the RL... not the > SL... at 85% winrate. The DarkForest paper shows more data with winrates... > KGS network vs Pachi 10k won ~23% of games.. but GoGoD trained won ~59%. > They also tacked on extended features and 3 step prediction... so who knows. > > I am actually feeling a million times better about 51% being the heavy > zone for GoGoD data. Makes my graphs make more sense. > > Graphs now: > > https://drive.google.com/file/d/0B0BbrXeL6VyCZEJuMG5nVG9NYkU/view > > https://drive.google.com/file/d/0B0BbrXeL6VyCR3ZxaUVGNU5pVDQ/view > > Gonna keep going with the magenta and black line... figure I can get to 48 > percent. I can run 10 million pairs in a day... so the graph width is 1 > week. Lol... feel so happy if 57 isnt expected on GoGoD. 51 looks fine and > approachable on my graphs. > > For the game phase batched data... the DarkForest paper explicitly calls > out that they got stuck in poor minima without it. I figured that > randomness was fine... but you could definitely get some skews... like no > beginning moves in a minibatch of size 16 like AG. Their paper didn't > elaborate... but did mention 16 threads... to generate a pair.. i select > one random game from all of the available sgf files... and split the game > into 16 sections. I am using threading too.. so more to that... but > basically 16 sets of 16 makes for a 256 minibatch like DarkForest team. > > Think the only way to beat Zen or CrazyStone is to get the value network > or fast-rollout with MCTS. Of course... CrazyStone is evolving too... so > maybe not a goal. > > > > On Tue, Aug 23, 2016 at 11:17 PM, David Fotland <fotl...@smart-games.com> > wrote: > >> I train using approximately the same training set as AlphaGo, but so far >> without the augmentation with rotations and reflection. My target is about >> 55.5%, since that's what Alphago got on their training set without >> reinforcement learning. >> >> I find I need 5x5 in the first layer, at least 12 layers, and at least 96 >> filters to get over 50%. My best net is 55.3%, 18 layers by 96 filters. I >> use simple SGD with a 64 minibatch, no momentum, 0.01 learning rate until >> it flattens out, then 0.001. I have two 980TI, and the best nets take about >> 5 days to train (about 20 epochs on about 30M positions). The last few >> percent is just trial and error. Sometimes making the net wider or deeper >> makes it weaker. Perhaps it's just variation from one training run to >> another. I haven’t tried training the same net more than once. >> >> David >> >> > -----Original Message----- >> > From: Computer-go [mailto:computer-go-boun...@computer-go.org] On >> Behalf >> > Of Gian-Carlo Pascutto >> > Sent: Tuesday, August 23, 2016 12:42 AM >> > To: computer-go@computer-go.org >> > Subject: Re: [Computer-go] Converging to 57% >> > >> > On 23-08-16 08:57, Detlef Schmicker wrote: >> > >> > > So, if somebody is sure, it is measured against GoGod, I think a >> > > number of other go programmers have to think again. I heard them >> > > reaching 51% (e. g. posts by Hiroshi in this list) >> > >> > I trained a 128 x 14 network for Leela 0.7.0 and this gets 51.1% on >> > GoGoD. >> > >> > Something I noticed from the papers is that the prediction percentage >> > keeps going upwards with more epochs, even if slowly, but still clearly >> > up. >> > >> > In my experience my networks converge rather quickly (like >0.5% per >> > epoch after the first), get stuck, get one more 0.5% gain if I lower the >> > learning rate (by a factor 5 or 10) and don't gain any more regardless >> > of what I do thereafter. >> > >> > I do use momentum. IIRC I tested without momentum once and it was worse, >> > and much slower. >> > >> > I did not find any improvement in playing strength from doing Facebook's >> > 3 move prediction. Perhaps it needs much bigger networks than 128 x 12. >> > >> > Adding ladder features also isn't good enough to (consistently) keep the >> > network from playing into them. (And once it's played the first move, >> > you're totally SOL because the resulting positions aren't in the >> > training set and you'll get 99% confidence for continuing the losing >> > ladder moves) >> > >> > I'm currently doing a more systematic comparison of all methods (and >> > GoGoD vs KGS+GoGoD) on 128 x 12, and testing the resulting strength >> > (rather than looking at prediction %). I'll post the results here, if >> > anything definite comes out of it. >> > >> > -- >> > GCP >> > _______________________________________________ >> > Computer-go mailing list >> > Computer-go@computer-go.org >> > http://computer-go.org/mailman/listinfo/computer-go >> >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go