Thanks DB Tsai, it is very helpful. Cheers, Wei
2015-06-23 16:00 GMT-07:00 DB Tsai <dbt...@dbtsai.com>: > Please see the current version of code for better documentation. > > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala > > Sincerely, > > DB Tsai > ---------------------------------------------------------- > Blog: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Tue, Jun 23, 2015 at 3:58 PM, DB Tsai <dbt...@dbtsai.com> wrote: > > The regularization is handled after the objective function of data is > > computed. See > https://github.com/apache/spark/blob/6a827d5d1ec520f129e42c3818fe7d0d870dcbef/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala > > line 348 for L2. > > > > For L1, it's handled by OWLQN, so you don't see it explicitly, but the > > code is in line 128. > > > > Sincerely, > > > > DB Tsai > > ---------------------------------------------------------- > > Blog: https://www.dbtsai.com > > PGP Key ID: 0xAF08DF8D > > > > > > On Tue, Jun 23, 2015 at 3:14 PM, Wei Zhou <zhweisop...@gmail.com> wrote: > >> Hi DB Tsai, > >> > >> Thanks for your reply. I went through the source code of > >> LinearRegression.scala. The algorithm minimizes square error L = 1/2n > ||A > >> weights - y||^2^. I cannot match this with the elasticNet loss function > >> found here http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html, > which > >> is the sum of square error plus L1 and L2 penalty. > >> > >> I am able to follow the rest of the mathematical deviation in the code > >> comment. I am hoping if you could point me to any references that can > fill > >> this knowledge gap. > >> > >> Best, > >> Wei > >> > >> > >> > >> 2015-06-19 12:35 GMT-07:00 DB Tsai <dbt...@dbtsai.com>: > >>> > >>> Hi Wei, > >>> > >>> I don't think ML is meant for single node computation, and the > >>> algorithms in ML are designed for pipeline framework. > >>> > >>> In short, the lasso regression in ML is new algorithm implemented from > >>> scratch, and it's faster, and converged to the same solution as R's > >>> glmnet but with scalability. Here is the talk I gave in Spark summit > >>> about the new elastic-net feature in ML. I will encourage you to try > >>> the one ML. > >>> > >>> > >>> > http://www.slideshare.net/dbtsai/2015-06-largescale-lasso-and-elasticnet-regularized-generalized-linear-models-at-spark-summit > >>> > >>> Sincerely, > >>> > >>> DB Tsai > >>> ---------------------------------------------------------- > >>> Blog: https://www.dbtsai.com > >>> PGP Key ID: 0xAF08DF8D > >>> > >>> > >>> On Fri, Jun 19, 2015 at 11:38 AM, Wei Zhou <zhweisop...@gmail.com> > wrote: > >>> > Hi Spark experts, > >>> > > >>> > I see lasso regression/ elastic net implementation under both MLLib > and > >>> > ML, > >>> > does anyone know what is the difference between the two > implementation? > >>> > > >>> > In spark summit, one of the keynote speakers mentioned that ML is > meant > >>> > for > >>> > single node computation, could anyone elaborate this? > >>> > > >>> > Thanks. > >>> > > >>> > Wei > >> > >> >