Re: Difference between Lasso regression in MLlib package and ML package

DB Tsai Tue, 23 Jun 2015 16:01:44 -0700

Please see the current version of code for better documentation.
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala


Sincerely,

DB Tsai
----------------------------------------------------------
Blog: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Tue, Jun 23, 2015 at 3:58 PM, DB Tsai <dbt...@dbtsai.com> wrote:
> The regularization is handled after the objective function of data is
> computed. See 
> https://github.com/apache/spark/blob/6a827d5d1ec520f129e42c3818fe7d0d870dcbef/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
>  line 348 for L2.
>
> For L1, it's handled by OWLQN, so you don't see it explicitly, but the
> code is in line 128.
>
> Sincerely,
>
> DB Tsai
> ----------------------------------------------------------
> Blog: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
>
> On Tue, Jun 23, 2015 at 3:14 PM, Wei Zhou <zhweisop...@gmail.com> wrote:
>> Hi DB Tsai,
>>
>> Thanks for your reply. I went through the source code of
>> LinearRegression.scala. The algorithm minimizes square error L = 1/2n ||A
>> weights - y||^2^. I cannot match this with the elasticNet loss function
>> found here http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html, which
>> is the sum of square error plus L1 and L2 penalty.
>>
>> I am able to follow the rest of the mathematical deviation in the code
>> comment. I am hoping if you could point me to any references that can fill
>> this knowledge gap.
>>
>> Best,
>> Wei
>>
>>
>>
>> 2015-06-19 12:35 GMT-07:00 DB Tsai <dbt...@dbtsai.com>:
>>>
>>> Hi Wei,
>>>
>>> I don't think ML is meant for single node computation, and the
>>> algorithms in ML are designed for pipeline framework.
>>>
>>> In short, the lasso regression in ML is new algorithm implemented from
>>> scratch, and it's faster, and converged to the same solution as R's
>>> glmnet but with scalability. Here is the talk I gave in Spark summit
>>> about the new elastic-net feature in ML. I will encourage you to try
>>> the one ML.
>>>
>>>
>>> http://www.slideshare.net/dbtsai/2015-06-largescale-lasso-and-elasticnet-regularized-generalized-linear-models-at-spark-summit
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> ----------------------------------------------------------
>>> Blog: https://www.dbtsai.com
>>> PGP Key ID: 0xAF08DF8D
>>>
>>>
>>> On Fri, Jun 19, 2015 at 11:38 AM, Wei Zhou <zhweisop...@gmail.com> wrote:
>>> > Hi Spark experts,
>>> >
>>> > I see lasso regression/ elastic net implementation under both MLLib and
>>> > ML,
>>> > does anyone know what is the difference between the two implementation?
>>> >
>>> > In spark summit, one of the keynote speakers mentioned that ML is meant
>>> > for
>>> > single node computation, could anyone elaborate this?
>>> >
>>> > Thanks.
>>> >
>>> > Wei
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Difference between Lasso regression in MLlib package and ML package

Reply via email to