Till Rohrmann created FLINK-2162: ------------------------------------ Summary: Implement adaptive learning rate strategies for SGD Key: FLINK-2162 URL: https://issues.apache.org/jira/browse/FLINK-2162 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Till Rohrmann Priority: Minor
At the moment, the SGD implementation uses a simple adaptive learning rate strategy, {{adaptedLearningRate = initialLearningRate / sqrt(iterationNumber) }}, which makes the optimization algorithm sensitive to the setting of the {{initialLearningRate}}. If this value is chosen wrongly, then the SGD might become instable. There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to investigate these approaches. Resources: [1] [http://imgur.com/a/Hqolp] [2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html] [3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf] [4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf] [5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332)