Sorry to spam... this'll be the last one for now. I bet if anyone who trained on KGS data scaled their data to my graph (pairs processed to accuracy)... that you hit ~50% after around 25M pairs processed. That is what the graph in the DarkForest paper implies. Which is why I was tearing my hair out trying to see how they converged so quickly on GoGoD (or if the implementation I am using is horribly flawed). Play strength seems to say the network is working... so doubt the implementation is completely out of whack.
On Wed, Aug 24, 2016 at 1:01 AM, Robert Waite <winstonwa...@gmail.com> wrote: > And guess for posterity... I am using what I believe to be a correct > implementation of the SL from AG... with 12 layers, 128 filters instead of > 192 for inner layers, 46 features (2 ladder features missing) instead of > 48, all 8 symmetries on the input data. Adam runs had to decrease LR to > .0001... but think that makes sense... Adam paper had batchsizes of 128 and > LR of .001. I was using 16 in a minibatch at the time so LR generally needs > to decrease because less accurate gradients. For example... 256 MB with LR > of .05 in DarkForest is somewhat proportional to 16 in minibatch and .003 > LR. In the end been staying mainly with vanilla SGD because all of the > papers seem to and the tracks seemed pretty bouncy to me. > > My guess is if you run a 51% accuracy GoGoD network or perhaps even a 56% > KGS network... and dont worry too much about features or filter counts... > you could get 3D on KGS vs. humans. With just a network evaluation... that > is pretty dang impressive. I guess there might be weaknesses that humans > could figure vs. just the networks without search... but still... I'd be > pretty happy. > > On Wed, Aug 24, 2016 at 12:30 AM, Robert Waite <winstonwa...@gmail.com> > wrote: > >> @Detlef It is comforting to hear that GoGoD data seemed to converge >> towards 51% in your testing. When I ran KGS data... it definitely converged >> more quickly but I stopped them short. I think it all makes sense if figure >> 5 of the DarkForest paper is the convergence of KGS data... and it doesn't >> seem clear... but looking at the paper now... they are comparing with >> Maddison and makes sense they would show the numbers for the same dataset. >> >> @GCP the three move strength graphs looked shaky to me... it doesnt seem >> like a clear change in strength. For the ladder issue... I think MCTS and a >> value or fast rollout network are how AG overcame weaknesses like that. The >> fast rollout network is actually the vaguest part to me... i have red some >> of the ancestor papers... and can see that people in the field know what >> they are describing mostly.. but I don't know where to begin to get the >> pattern counts listed in the AG tables at the end of the paper >> >> @David Have you matched your network vs. GnuGo? I think accuracy and loss >> are indicators of model health... but playing strength seems diff. The AG >> paper only mentions beating Pachi at 100k rollouts with the RL... not the >> SL... at 85% winrate. The DarkForest paper shows more data with winrates... >> KGS network vs Pachi 10k won ~23% of games.. but GoGoD trained won ~59%. >> They also tacked on extended features and 3 step prediction... so who knows. >> >> I am actually feeling a million times better about 51% being the heavy >> zone for GoGoD data. Makes my graphs make more sense. >> >> Graphs now: >> >> https://drive.google.com/file/d/0B0BbrXeL6VyCZEJuMG5nVG9NYkU/view >> >> https://drive.google.com/file/d/0B0BbrXeL6VyCR3ZxaUVGNU5pVDQ/view >> >> Gonna keep going with the magenta and black line... figure I can get to >> 48 percent. I can run 10 million pairs in a day... so the graph width is 1 >> week. Lol... feel so happy if 57 isnt expected on GoGoD. 51 looks fine and >> approachable on my graphs. >> >> For the game phase batched data... the DarkForest paper explicitly calls >> out that they got stuck in poor minima without it. I figured that >> randomness was fine... but you could definitely get some skews... like no >> beginning moves in a minibatch of size 16 like AG. Their paper didn't >> elaborate... but did mention 16 threads... to generate a pair.. i select >> one random game from all of the available sgf files... and split the game >> into 16 sections. I am using threading too.. so more to that... but >> basically 16 sets of 16 makes for a 256 minibatch like DarkForest team. >> >> Think the only way to beat Zen or CrazyStone is to get the value network >> or fast-rollout with MCTS. Of course... CrazyStone is evolving too... so >> maybe not a goal. >> >> >> >> On Tue, Aug 23, 2016 at 11:17 PM, David Fotland <fotl...@smart-games.com> >> wrote: >> >>> I train using approximately the same training set as AlphaGo, but so far >>> without the augmentation with rotations and reflection. My target is about >>> 55.5%, since that's what Alphago got on their training set without >>> reinforcement learning. >>> >>> I find I need 5x5 in the first layer, at least 12 layers, and at least >>> 96 filters to get over 50%. My best net is 55.3%, 18 layers by 96 filters. >>> I use simple SGD with a 64 minibatch, no momentum, 0.01 learning rate until >>> it flattens out, then 0.001. I have two 980TI, and the best nets take about >>> 5 days to train (about 20 epochs on about 30M positions). The last few >>> percent is just trial and error. Sometimes making the net wider or deeper >>> makes it weaker. Perhaps it's just variation from one training run to >>> another. I haven’t tried training the same net more than once. >>> >>> David >>> >>> > -----Original Message----- >>> > From: Computer-go [mailto:computer-go-boun...@computer-go.org] On >>> Behalf >>> > Of Gian-Carlo Pascutto >>> > Sent: Tuesday, August 23, 2016 12:42 AM >>> > To: computer-go@computer-go.org >>> > Subject: Re: [Computer-go] Converging to 57% >>> > >>> > On 23-08-16 08:57, Detlef Schmicker wrote: >>> > >>> > > So, if somebody is sure, it is measured against GoGod, I think a >>> > > number of other go programmers have to think again. I heard them >>> > > reaching 51% (e. g. posts by Hiroshi in this list) >>> > >>> > I trained a 128 x 14 network for Leela 0.7.0 and this gets 51.1% on >>> > GoGoD. >>> > >>> > Something I noticed from the papers is that the prediction percentage >>> > keeps going upwards with more epochs, even if slowly, but still clearly >>> > up. >>> > >>> > In my experience my networks converge rather quickly (like >0.5% per >>> > epoch after the first), get stuck, get one more 0.5% gain if I lower >>> the >>> > learning rate (by a factor 5 or 10) and don't gain any more regardless >>> > of what I do thereafter. >>> > >>> > I do use momentum. IIRC I tested without momentum once and it was >>> worse, >>> > and much slower. >>> > >>> > I did not find any improvement in playing strength from doing >>> Facebook's >>> > 3 move prediction. Perhaps it needs much bigger networks than 128 x 12. >>> > >>> > Adding ladder features also isn't good enough to (consistently) keep >>> the >>> > network from playing into them. (And once it's played the first move, >>> > you're totally SOL because the resulting positions aren't in the >>> > training set and you'll get 99% confidence for continuing the losing >>> > ladder moves) >>> > >>> > I'm currently doing a more systematic comparison of all methods (and >>> > GoGoD vs KGS+GoGoD) on 128 x 12, and testing the resulting strength >>> > (rather than looking at prediction %). I'll post the results here, if >>> > anything definite comes out of it. >>> > >>> > -- >>> > GCP >>> > _______________________________________________ >>> > Computer-go mailing list >>> > Computer-go@computer-go.org >>> > http://computer-go.org/mailman/listinfo/computer-go >>> >>> _______________________________________________ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >> >> >> >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go