Thanks Burak. Step size 0.01 worked for b) and step=0.00000001 for c) !
Cheers
<k/>

On Wed, Oct 1, 2014 at 3:00 PM, Burak Yavuz <bya...@stanford.edu> wrote:

> Hi,
>
> It appears that the step size is too high that the model is diverging with
> the added noise.
> Could you try by setting the step size to be 0.1 or 0.01?
>
> Best,
> Burak
>
> ----- Original Message -----
> From: "Krishna Sankar" <ksanka...@gmail.com>
> To: user@spark.apache.org
> Sent: Wednesday, October 1, 2014 12:43:20 PM
> Subject: MLlib Linear Regression Mismatch
>
> Guys,
>    Obviously I am doing something wrong. May be 4 points are too small a
> dataset.
> Can you help me to figure out why the following doesn't work ?
> a) This works :
>
> data = [
>    LabeledPoint(0.0, [0.0]),
>    LabeledPoint(10.0, [10.0]),
>    LabeledPoint(20.0, [20.0]),
>    LabeledPoint(30.0, [30.0])
> ]
> lrm = LinearRegressionWithSGD.train(sc.parallelize(data),
> initialWeights=array([1.0]))
> print lrm
> print lrm.weights
> print lrm.intercept
> lrm.predict([40])
>
> output:
> <pyspark.mllib.regression.LinearRegressionModel object at 0x109813d50>
>
> [ 1.]
> 0.0
>
> 40.0
>
> b) By perturbing the y a little bit, the model gives wrong results:
>
> data = [
>    LabeledPoint(0.0, [0.0]),
>    LabeledPoint(9.0, [10.0]),
>    LabeledPoint(22.0, [20.0]),
>    LabeledPoint(32.0, [30.0])
> ]
> lrm = LinearRegressionWithSGD.train(sc.parallelize(data),
> initialWeights=array([1.0])) # should be 1.09x -0.60
> print lrm
> print lrm.weights
> print lrm.intercept
> lrm.predict([40])
>
> Output:
> <pyspark.mllib.regression.LinearRegressionModel object at 0x109666590>
>
> [ -8.20487463e+203]
> 0.0
>
> -3.2819498532740317e+205
>
> c) Same story here - wrong results. Actually nan:
>
> data = [
>    LabeledPoint(18.9, [3910.0]),
>    LabeledPoint(17.0, [3860.0]),
>    LabeledPoint(20.0, [4200.0]),
>    LabeledPoint(16.6, [3660.0])
> ]
> lrm = LinearRegressionWithSGD.train(sc.parallelize(data),
> initialWeights=array([1.0])) # should be ~ 0.006582x -7.595170
> print lrm
> print lrm.weights
> print lrm.intercept
> lrm.predict([4000])
>
> Output:<pyspark.mllib.regression.LinearRegressionModel object at
> 0x109666b90>
>
> [ nan]
> 0.0
>
> nan
>
> Cheers & Thanks
> <k/>
>
>

Reply via email to