Thanks Sean. Our training MSE is really large. We definitely need better
predictor variables.

Training Mean Squared Error = 7.72E8

Thanks,
Manish


On Mon, Mar 6, 2017 at 4:45 PM, Sean Owen <so...@cloudera.com> wrote:

> There's nothing unusual about negative values from a linear regression.
> If, generally, your predicted values are far from your actual values, then
> your model hasn't fit well. You may have a bug somewhere in your pipeline
> or you may have data without much linear relationship. Most of this isn't a
> Spark problem.
>
> On Mon, Mar 6, 2017 at 8:05 AM Manish Maheshwari <mylogi...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> We are using a LinearRegressionModel in Scala. We are using a standard
>> StandardScaler to normalize the data before modelling.. the Code snippet
>> looks like this -
>>
>> *Modellng - *
>> val labeledPointsRDD = tableRecords.map(row =>
>> {
>> val filtered = row.toSeq.filter({ case s: String => false case _ => true
>> })
>> val converted = filtered.map({ case i: Int => i.toDouble case l: Long =>
>> l.toDouble case d: Double => d case _ => 0.0 })
>> val features = Vectors.dense(converted.slice(1,
>> converted.length).toArray)
>> LabeledPoint(converted(0), features)
>> })
>> val scaler1 = new StandardScaler().fit(labeledPointsRDD.map(x =>
>> x.features))
>> save(sc, scalarModelOutputPath, scaler1)
>> val normalizedData = labeledPointsRDD.map(lp => {LabeledPoint(lp.label,
>> scaler1.transform(lp.features))})
>> val splits = normalizedData.randomSplit(Array(0.8, 0.2))
>> val trainingData = splits(0)
>> val testingData = splits(1)
>> trainingData.cache()
>> var regression = new LinearRegressionWithSGD().setIntercept(true)
>> regression.optimizer.setStepSize(0.01)
>> val model = regression.run(trainingData)
>> model.save(sc, modelOutputPath)
>>
>> Post that when we score the model on the same data that it was trained on
>> using the below snippet we see this -
>>
>> *Scoring - *
>> val labeledPointsRDD = tableRecords.map(row =>
>> {val filtered = row.toSeq.filter({ case s: String => false case _ => true
>> })
>> val converted = filtered.map({ case i: Int => i.toDouble case l: Long =>
>> l.toDouble case d: Double => d case _ => 0.0 })
>> val features = Vectors.dense(converted.toArray)
>> (row(0), features)
>> })
>> val scaler1 = read(sc,scalarModelOutputPath)
>> val normalizedData = labeledPointsRDD.map(p => (p._1,
>> scaler1.transform(p._2)))
>> normalizedData.cache()
>> val model = LinearRegressionModel.load(sc,modelOutputPath)
>> val valuesAndPreds = normalizedData.map(p => (p._1.toString(),
>> model.predict(p._2)))
>>
>> However, a lot of predicted values are negative. The input data has no
>> negative values we we are unable to understand this behaviour.
>> Further the order and sequence of all the variables remains the same in
>> the modelling and testing data frames.
>>
>> Any ideas?
>>
>> Thanks,
>> Manish
>>
>>

Reply via email to