Hi everyone,

Sorry I'm late to the thread here, but I want to point out a few things.
This is, of course, a most welcome contribution and it will be immediately
useful to everything currently using the stochastic gradient optimizers!

1) I'm all for refactoring the optimization methods to make them a little
more general - Perhaps there should be a "FirstOrderUpdater" that
subclasses updater which takes things like stepsize as paramters, but still
has an "compute" method. While the updater APIs are public, I'd be
surprised if anyone is using them directly, instead, I'd expect most people
to be using the APIs that rely on them (namely the SVM, Logistic, and
Linear regression classes). It should *definitely* be possible to keep the
loss function we're minimizing separate from the optimization method.

2) That said - LBFGS *does* rely on being able to take gradients of the
loss function, and is a gradient based method.

3) On that note, in your branch, the access pattern (and code pattern) for
L-BFGS is basically identical to the code for minibatchSGD - I may be
missing something, but we really should try to factor out the parts that
are the same and avoid duplicating this logic. I *think* coming up with an
LBFGSUpdater with an appropriate compute method is all we need
(particularly since we're keeping track of the loss history), but I might
be wrong here.

4) In general, I think we should think about incurring technical debt
through duplicated (in functionality) code in the codebase (e.g. yet
another Vector sum/multiply class) and code written in languages other than
Scala in MLlib - the fortran/C++ implementations of L-BFGS aren't doing
anything magical, and as long as we can get similar performance it will be
much easier to maintain if everything is in Scala (with some critical bits
in other languages - but I don't think this falls into that case).
Additionally, we should think about whether we really need these additional
dependencies. While I'm sure Mallet is great, I'm a little worried that
adding it as a dependency for one or two functions we could pretty easily
reimplement might be a little heavy and present problems in the future.

Anyway, you should submit a PR and we can work on it!

- Evan




On Tue, Feb 25, 2014 at 5:52 PM, Debasish Das <debasish.da...@gmail.com>wrote:

> Hi DB,
>
> Could you please point me to your spark PR ?
>
> Thanks.
> Deb
>
>
> On Tue, Feb 25, 2014 at 5:03 PM, DB Tsai <dbt...@alpinenow.com> wrote:
>
> > Hi Deb, Xiangrui
> >
> > I just moved the LBFGS code to maven central, and cleaned up the code
> > a little bit.
> >
> > https://github.com/AlpineNow/incubator-spark/commits/dbtsai-LBFGS
> >
> > After looking at Mallet, the api is pretty simple, and it's probably
> > can be easily tested
> > based on my PR.
> >
> > It will be tricky to just benchmark the time of optimizers by
> > excluding the parallel gradientSum
> > and lossSum computation, and I don't have good approach yet. Let's
> > compare the accuracy for the time being.
> >
> > Thanks.
> >
> > Sincerely,
> >
> > DB Tsai
> > Machine Learning Engineer
> > Alpine Data Labs
> > --------------------------------------
> > Web: http://alpinenow.com/
> >
> >
> > On Tue, Feb 25, 2014 at 12:07 PM, Debasish Das <debasish.da...@gmail.com
> >
> > wrote:
> > > Hi DB,
> > >
> > > I am considering building on your PR and add Mallet as the dependency
> so
> > > that we can run some basic comparisons test on large scale sparse
> > datasets
> > > that I have.
> > >
> > > In the meantime, let's discuss if there are other optimization packages
> > > that we should try.
> > >
> > > My wishlist has bounded bfgs as well and I will add it to the PR.
> > >
> > > About the PR getting merged to mllib, we can plan that later.
> > >
> > > Thanks.
> > > Deb
> > >
> > >
> > >
> > > On Tue, Feb 25, 2014 at 11:36 AM, DB Tsai <dbt...@alpinenow.com>
> wrote:
> > >
> > >> I find some comparison between Mallet vs Fortran version. The result
> > >> is closed but not the same.
> > >>
> > >>
> >
> http://t3827.ai-mallet-development.aitalk.info/help-with-l-bfgs-t3827.html
> > >>
> > >> Here is LBFGS-B
> > >> Cost: 0.6902411220175793
> > >> Gradient: -5.453609E-007, -2.858372E-008, -1.369706E-007
> > >> Theta: -0.014186210102171406, -0.303521206706629,
> -0.018132348904129902
> > >>
> > >> And Mallet LBFGS (Tollerance .000000000000001)
> > >> Cost: 0.6902412268833071
> > >> Gradient: 0.000117, -4.615523E-005, 0.000114
> > >> Theta: -0.013914961040040107, -0.30419883021414335,
> > -0.016838481937958744
> > >>
> > >> So this shows me, that Mallet is close, but Plain ol Gradient Descent
> > >> and LBFGS-B are really close.
> > >> I see that Mallet also has a "LineOptimizer" and "Evaluator" that I
> > >> have yet to explore...
> > >>
> > >> Sincerely,
> > >>
> > >> DB Tsai
> > >> Machine Learning Engineer
> > >> Alpine Data Labs
> > >> --------------------------------------
> > >> Web: http://alpinenow.com/
> > >>
> > >>
> > >> On Tue, Feb 25, 2014 at 11:16 AM, DB Tsai <dbt...@alpinenow.com>
> wrote:
> > >> > Hi Deb,
> > >> >
> > >> > On Tue, Feb 25, 2014 at 7:07 AM, Debasish Das <
> > debasish.da...@gmail.com>
> > >> wrote:
> > >> >> Continuation on last email sent by mistake:
> > >> >>
> > >> >> Is cpl license is compatible with apache ?
> > >> >>
> > >> >> http://opensource.org/licenses/cpl1.0.php
> > >> >
> > >> > Based on what I read here, there is no problem to include CPL code
> in
> > >> > apache project
> > >> > as long as the code isn't modified, and we include the maven binary.
> > >> > https://www.apache.org/legal/3party.html
> > >> >
> > >> >> Mallet jars are available on maven. They have hessian based solvers
> > >> which
> > >> >> looked interesting along with bfgs and cg.
> > >> >
> > >> > We found that hessian based solvers don't scale as the # of features
> > >> grow, and
> > >> > we have lots of customers trying to train sparse input. That's our
> > >> motivation to
> > >> > work on L-BFGS which approximate hessian using just a few vectors.
> > >> >
> > >> > Just take a look at MALLET, and it does have L-BFGS and its variant
> > >> OWL-QN
> > >> > which can tackle L1 problem. Since implementing L-BFGS is very
> > subtle, I
> > >> don't
> > >> > know the quality of the mallet implementation. Personally, I
> > >> > implemented one based
> > >> > on textbook, and not very stable. If MALLET is robust, I'll go for
> it
> > >> > since it has more
> > >> > features, and already in maven.
> > >> >
> > >> >> Note that right now the version is not blas optimized. With jblas
> or
> > >> >> netlib-java discussions that's going on it can be improved. Also it
> > >> runs on
> > >> >> a single thread which can be improved...so there is scope for
> further
> > >> >> improvements in the code.
> > >> >
> > >> > I think it will not impact performance even it's not blas optimized
> > >> > nor multi-threaded,
> > >> > since most of the parallelization is in computing gradientSum and
> > >> > lossSum in Spark,
> > >> > and the optimizer just takes gradientSum, lossSum, and weights to
> get
> > >> > the newWeights.
> > >> >
> > >> > As a result, 99.9% of time is in computing gradientSum and lossSum.
> > >> > Only small amount
> > >> > of time is in optimization.
> > >> >
> > >> >>
> > >> >> Basically Xiangrui, is there a push back on making optimizers part
> of
> > >> spark
> > >> >> mllib ? I am exploring cg and qp solvers for spark mllib as well
> and
> > I
> > >> am
> > >> >> developing these as part of mllib optimization. I was hoping we
> > should
> > >> be
> > >> >> able to publish mllib as a maven artifact later.
> > >> >>
> > >> >> Thanks.
> > >> >> Deb
> > >> >
> > >> > Thanks.
> > >> >
> > >> > Sincerely,
> > >> >
> > >> > DB Tsai
> > >> > Machine Learning Engineer
> > >> > Alpine Data Labs
> > >> > --------------------------------------
> > >> > Web: http://alpinenow.com/
> > >>
> >
>

Reply via email to