Hi Professor Lin, On our internal datasets, I am getting accuracy at par with glmnet-R for sparse feature selection from liblinear. The default mllib based gradient descent was way off. I did not tune learning rate but I run with varying lambda. Ths feature selection was weak.
I used liblinear code. Next I will explore the distributed liblinear. Adding the code on github will definitely help for collaboration. I am experimenting if a bfgs / owlqn based sparse logistic in spark mllib give us accuracy at par with liblinear. If liblinear solver outperforms them (either accuracy/performance) we have to bring tron to mllib and let other algorithms benefit from it as well. We are using Bfgs and Owlqn solvers from breeze opt. Thanks. Deb On May 12, 2014 9:07 PM, "DB Tsai" <dbt...@stanford.edu> wrote: > It seems that the code isn't managed in github. Can be downloaded from > http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/distributed-liblinear/spark/spark-liblinear-1.94.zip > > It will be easier to track the changes in github. > > > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Mon, May 12, 2014 at 7:53 AM, Xiangrui Meng <men...@gmail.com> wrote: > >> Hi Chieh-Yen, >> >> Great to see the Spark implementation of LIBLINEAR! We will definitely >> consider adding a wrapper in MLlib to support it. Is the source code >> on github? >> >> Deb, Spark LIBLINEAR uses BSD license, which is compatible with Apache. >> >> Best, >> Xiangrui >> >> On Sun, May 11, 2014 at 10:29 AM, Debasish Das <debasish.da...@gmail.com> >> wrote: >> > Hello Prof. Lin, >> > >> > Awesome news ! I am curious if you have any benchmarks comparing C++ MPI >> > with Scala Spark liblinear implementations... >> > >> > Is Spark Liblinear apache licensed or there are any specific >> restrictions on >> > using it ? >> > >> > Except using native blas libraries (which each user has to manage by >> pulling >> > in their best proprietary BLAS package), all Spark code is Apache >> licensed. >> > >> > Thanks. >> > Deb >> > >> > >> > On Sun, May 11, 2014 at 3:01 AM, DB Tsai <dbt...@stanford.edu> wrote: >> >> >> >> Dear Prof. Lin, >> >> >> >> Interesting! We had an implementation of L-BFGS in Spark and already >> >> merged in the upstream now. >> >> >> >> We read your paper comparing TRON and OWL-QN for logistic regression >> with >> >> L1 (http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf), but it seems >> that it's >> >> not in the distributed setup. >> >> >> >> Will be very interesting to know the L2 logistic regression benchmark >> >> result in Spark with your TRON optimizer and the L-BFGS optimizer >> against >> >> different datasets (sparse, dense, and wide, etc). >> >> >> >> I'll try your TRON out soon. >> >> >> >> >> >> Sincerely, >> >> >> >> DB Tsai >> >> ------------------------------------------------------- >> >> My Blog: https://www.dbtsai.com >> >> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >> >> >> >> On Sun, May 11, 2014 at 1:49 AM, Chieh-Yen <r01944...@csie.ntu.edu.tw> >> >> wrote: >> >>> >> >>> Dear all, >> >>> >> >>> Recently we released a distributed extension of LIBLINEAR at >> >>> >> >>> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/distributed-liblinear/ >> >>> >> >>> Currently, TRON for logistic regression and L2-loss SVM is supported. >> >>> We provided both MPI and Spark implementations. >> >>> This is very preliminary so your comments are very welcome. >> >>> >> >>> Thanks, >> >>> Chieh-Yen >> >> >> >> >> > >> > >