Till Rohrmann created FLINK-2162:
------------------------------------

             Summary: Implement adaptive learning rate strategies for SGD
                 Key: FLINK-2162
                 URL: https://issues.apache.org/jira/browse/FLINK-2162
             Project: Flink
          Issue Type: Improvement
          Components: Machine Learning Library
            Reporter: Till Rohrmann
            Priority: Minor


At the moment, the SGD implementation uses a simple adaptive learning rate 
strategy, {{adaptedLearningRate = initialLearningRate / sqrt(iterationNumber) 
}}, which makes the optimization algorithm sensitive to the setting of the 
{{initialLearningRate}}. If this value is chosen wrongly, then the SGD might 
become instable.

There are better ways to calculate the learning rate [1] such as Adagrad [3], 
Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more 
stable optimization algorithms which don't require so much hyperparameter 
tweaking. It might be worthwhile to investigate these approaches.

Resources:
[1] [http://imgur.com/a/Hqolp]
[2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html]
[3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf]
[4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf]
[5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to