Hi Robin,
You can try this PR out. This has built-in features scaling, and has
ElasticNet regularization (L1/L2 mix). This implementation can stably
converge to model from R's glmnet package.
https://github.com/apache/spark/pull/4259
Sincerely,
DB Tsai
--
I'm working on LinearRegressionWithElasticNet using OWLQN now. This
will do the data standardization internally so it's transparent to
users. With OWLQN, you don't have to manually choose stepSize. Will
send out PR soon next week.
Sincerely,
DB Tsai
---
It was a bug in the code, however adding the step parameter got the results
to work. Mean Squared Error = 2.610379825794694E-5
I've also opened a jira to put the step parameter in the examples so that
people new to mllib have a way to improve the MSE.
https://issues.apache.org/jira/browse/SPARK-
It looks like you're training on the non-scaled data but testing on the
scaled data. Have you tried this training & testing on only the scaled
data?
On Thu, Jan 15, 2015 at 10:42 AM, Devl Devel
wrote:
> Thanks, that helps a bit at least with the NaN but the MSE is still very
> high even with th
Thanks, that helps a bit at least with the NaN but the MSE is still very
high even with that step size and 10k iterations:
training Mean Squared Error = 3.3322561285919316E7
Does this method need say 100k iterations?
On Thu, Jan 15, 2015 at 5:42 PM, Robin East wrote:
> -dev, +user
>
> You
-dev, +user
You’ll need to set the gradient descent step size to something small - a bit of
trial and error shows that 0.0001 works.
You’ll need to create a LinearRegressionWithSGD instance and set the step size
explicitly:
val lr = new LinearRegressionWithSGD()
lr.optimizer.setStepSize(0.