Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3569#discussion_r21224582
  
    --- Diff: docs/mllib-optimization.md ---
    @@ -138,6 +138,15 @@ vertical scalability issue (the number of training 
features) when computing the
     explicitly in Newton's method. As a result, L-BFGS often achieves rapider 
convergence compared with 
     other first-order optimization. 
     
    +### Choosing an Optimization Method
    +
    +[Linear methods](mllib-linear-methods.html) use optimization internally, 
and some linear methods in MLlib support both SGD and L-BFGS.
    +We give a few guidelines for choosing between methods.
    +However, different optimization methods can have different convergence 
guarantees depending on the properties of the objective function, and we cannot 
cover the literature here.
    +
    +* L-BFGS is recommended since it generally converges faster (in fewer 
iterations) than SGD.
    +* SGD can be faster for datasets with a very large number of instances 
(rows), especially when using a small `miniBatchFraction`.
    --- End diff --
    
    This part might not be true because we implemented mini-batch SGD but 
obtaining a mini-batch from an RDD is expensive, which requires one pass, while 
computing the gradient is not super expensive. Maybe we can also mention this 
trade-off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to