Re: Difference between Lasso regression in MLlib package and ML package

Wei Zhou Tue, 23 Jun 2015 16:30:12 -0700

Thanks DB Tsai, it is very helpful.

Cheers,
Wei


2015-06-23 16:00 GMT-07:00 DB Tsai <dbt...@dbtsai.com>:

> Please see the current version of code for better documentation.
>
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
>
> Sincerely,
>
> DB Tsai
> ----------------------------------------------------------
> Blog: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
>
> On Tue, Jun 23, 2015 at 3:58 PM, DB Tsai <dbt...@dbtsai.com> wrote:
> > The regularization is handled after the objective function of data is
> > computed. See
> https://github.com/apache/spark/blob/6a827d5d1ec520f129e42c3818fe7d0d870dcbef/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
> >  line 348 for L2.
> >
> > For L1, it's handled by OWLQN, so you don't see it explicitly, but the
> > code is in line 128.
> >
> > Sincerely,
> >
> > DB Tsai
> > ----------------------------------------------------------
> > Blog: https://www.dbtsai.com
> > PGP Key ID: 0xAF08DF8D
> >
> >
> > On Tue, Jun 23, 2015 at 3:14 PM, Wei Zhou <zhweisop...@gmail.com> wrote:
> >> Hi DB Tsai,
> >>
> >> Thanks for your reply. I went through the source code of
> >> LinearRegression.scala. The algorithm minimizes square error L = 1/2n
> ||A
> >> weights - y||^2^. I cannot match this with the elasticNet loss function
> >> found here http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html,
> which
> >> is the sum of square error plus L1 and L2 penalty.
> >>
> >> I am able to follow the rest of the mathematical deviation in the code
> >> comment. I am hoping if you could point me to any references that can
> fill
> >> this knowledge gap.
> >>
> >> Best,
> >> Wei
> >>
> >>
> >>
> >> 2015-06-19 12:35 GMT-07:00 DB Tsai <dbt...@dbtsai.com>:
> >>>
> >>> Hi Wei,
> >>>
> >>> I don't think ML is meant for single node computation, and the
> >>> algorithms in ML are designed for pipeline framework.
> >>>
> >>> In short, the lasso regression in ML is new algorithm implemented from
> >>> scratch, and it's faster, and converged to the same solution as R's
> >>> glmnet but with scalability. Here is the talk I gave in Spark summit
> >>> about the new elastic-net feature in ML. I will encourage you to try
> >>> the one ML.
> >>>
> >>>
> >>>
> http://www.slideshare.net/dbtsai/2015-06-largescale-lasso-and-elasticnet-regularized-generalized-linear-models-at-spark-summit
> >>>
> >>> Sincerely,
> >>>
> >>> DB Tsai
> >>> ----------------------------------------------------------
> >>> Blog: https://www.dbtsai.com
> >>> PGP Key ID: 0xAF08DF8D
> >>>
> >>>
> >>> On Fri, Jun 19, 2015 at 11:38 AM, Wei Zhou <zhweisop...@gmail.com>
> wrote:
> >>> > Hi Spark experts,
> >>> >
> >>> > I see lasso regression/ elastic net implementation under both MLLib
> and
> >>> > ML,
> >>> > does anyone know what is the difference between the two
> implementation?
> >>> >
> >>> > In spark summit, one of the keynote speakers mentioned that ML is
> meant
> >>> > for
> >>> > single node computation, could anyone elaborate this?
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Wei
> >>
> >>
>

Re: Difference between Lasso regression in MLlib package and ML package

Reply via email to