Hi Deb, Xiangrui I just moved the LBFGS code to maven central, and cleaned up the code a little bit.
https://github.com/AlpineNow/incubator-spark/commits/dbtsai-LBFGS After looking at Mallet, the api is pretty simple, and it's probably can be easily tested based on my PR. It will be tricky to just benchmark the time of optimizers by excluding the parallel gradientSum and lossSum computation, and I don't have good approach yet. Let's compare the accuracy for the time being. Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -------------------------------------- Web: http://alpinenow.com/ On Tue, Feb 25, 2014 at 12:07 PM, Debasish Das <debasish.da...@gmail.com> wrote: > Hi DB, > > I am considering building on your PR and add Mallet as the dependency so > that we can run some basic comparisons test on large scale sparse datasets > that I have. > > In the meantime, let's discuss if there are other optimization packages > that we should try. > > My wishlist has bounded bfgs as well and I will add it to the PR. > > About the PR getting merged to mllib, we can plan that later. > > Thanks. > Deb > > > > On Tue, Feb 25, 2014 at 11:36 AM, DB Tsai <dbt...@alpinenow.com> wrote: > >> I find some comparison between Mallet vs Fortran version. The result >> is closed but not the same. >> >> http://t3827.ai-mallet-development.aitalk.info/help-with-l-bfgs-t3827.html >> >> Here is LBFGS-B >> Cost: 0.6902411220175793 >> Gradient: -5.453609E-007, -2.858372E-008, -1.369706E-007 >> Theta: -0.014186210102171406, -0.303521206706629, -0.018132348904129902 >> >> And Mallet LBFGS (Tollerance .000000000000001) >> Cost: 0.6902412268833071 >> Gradient: 0.000117, -4.615523E-005, 0.000114 >> Theta: -0.013914961040040107, -0.30419883021414335, -0.016838481937958744 >> >> So this shows me, that Mallet is close, but Plain ol Gradient Descent >> and LBFGS-B are really close. >> I see that Mallet also has a "LineOptimizer" and "Evaluator" that I >> have yet to explore... >> >> Sincerely, >> >> DB Tsai >> Machine Learning Engineer >> Alpine Data Labs >> -------------------------------------- >> Web: http://alpinenow.com/ >> >> >> On Tue, Feb 25, 2014 at 11:16 AM, DB Tsai <dbt...@alpinenow.com> wrote: >> > Hi Deb, >> > >> > On Tue, Feb 25, 2014 at 7:07 AM, Debasish Das <debasish.da...@gmail.com> >> wrote: >> >> Continuation on last email sent by mistake: >> >> >> >> Is cpl license is compatible with apache ? >> >> >> >> http://opensource.org/licenses/cpl1.0.php >> > >> > Based on what I read here, there is no problem to include CPL code in >> > apache project >> > as long as the code isn't modified, and we include the maven binary. >> > https://www.apache.org/legal/3party.html >> > >> >> Mallet jars are available on maven. They have hessian based solvers >> which >> >> looked interesting along with bfgs and cg. >> > >> > We found that hessian based solvers don't scale as the # of features >> grow, and >> > we have lots of customers trying to train sparse input. That's our >> motivation to >> > work on L-BFGS which approximate hessian using just a few vectors. >> > >> > Just take a look at MALLET, and it does have L-BFGS and its variant >> OWL-QN >> > which can tackle L1 problem. Since implementing L-BFGS is very subtle, I >> don't >> > know the quality of the mallet implementation. Personally, I >> > implemented one based >> > on textbook, and not very stable. If MALLET is robust, I'll go for it >> > since it has more >> > features, and already in maven. >> > >> >> Note that right now the version is not blas optimized. With jblas or >> >> netlib-java discussions that's going on it can be improved. Also it >> runs on >> >> a single thread which can be improved...so there is scope for further >> >> improvements in the code. >> > >> > I think it will not impact performance even it's not blas optimized >> > nor multi-threaded, >> > since most of the parallelization is in computing gradientSum and >> > lossSum in Spark, >> > and the optimizer just takes gradientSum, lossSum, and weights to get >> > the newWeights. >> > >> > As a result, 99.9% of time is in computing gradientSum and lossSum. >> > Only small amount >> > of time is in optimization. >> > >> >> >> >> Basically Xiangrui, is there a push back on making optimizers part of >> spark >> >> mllib ? I am exploring cg and qp solvers for spark mllib as well and I >> am >> >> developing these as part of mllib optimization. I was hoping we should >> be >> >> able to publish mllib as a maven artifact later. >> >> >> >> Thanks. >> >> Deb >> > >> > Thanks. >> > >> > Sincerely, >> > >> > DB Tsai >> > Machine Learning Engineer >> > Alpine Data Labs >> > -------------------------------------- >> > Web: http://alpinenow.com/ >>