On 23/08/2016 11:26, Brian Sheppard wrote: > The learning rate seems much too high. My experience (which is from > backgammon rather than Go, among other caveats) is that you need tiny > learning rates. Tiny, as in 1/TrainingSetSize.
I think that's overkill, as in you effectively end up doing batch gradient descent instead of mini-batch/SGD. But yes, 0.01 is rather high with momentum. Try 0.001 for methods with momentum, and with the default Adam parameters you have to go even lower and try 0.0001. > Neural networks are dark magic. Be prepared to spend many weeks just > trying to figure things out. You can bet that the Google & FB results > are just their final runs. As always it's sad nobody publishes what didn't work saving us the time of trying it all over again :-) > Changing batching to match DarkForest style (making sure that a > minibatch contains samples from game phases... for example > beginning, middle and end-game). This sounds a bit suspicious. The entries in your minibatch should be randomly selected from your entire training set, so statistically having positions from all phases would be guaranteed. (Or you can shuffle the entire training set before the epoch, instead of randomly picking during it). Don't feed the positions in in-order or from the same game... -- GCP _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go