There's nothing unusual about negative values from a linear regression. If,
generally, your predicted values are far from your actual values, then your
model hasn't fit well. You may have a bug somewhere in your pipeline or you
may have data without much linear relationship. Most of this isn't a Spark
problem.

On Mon, Mar 6, 2017 at 8:05 AM Manish Maheshwari <mylogi...@gmail.com>
wrote:

> Hi All,
>
> We are using a LinearRegressionModel in Scala. We are using a standard
> StandardScaler to normalize the data before modelling.. the Code snippet
> looks like this -
>
> *Modellng - *
> val labeledPointsRDD = tableRecords.map(row =>
> {
> val filtered = row.toSeq.filter({ case s: String => false case _ => true })
> val converted = filtered.map({ case i: Int => i.toDouble case l: Long =>
> l.toDouble case d: Double => d case _ => 0.0 })
> val features = Vectors.dense(converted.slice(1, converted.length).toArray)
> LabeledPoint(converted(0), features)
> })
> val scaler1 = new StandardScaler().fit(labeledPointsRDD.map(x =>
> x.features))
> save(sc, scalarModelOutputPath, scaler1)
> val normalizedData = labeledPointsRDD.map(lp => {LabeledPoint(lp.label,
> scaler1.transform(lp.features))})
> val splits = normalizedData.randomSplit(Array(0.8, 0.2))
> val trainingData = splits(0)
> val testingData = splits(1)
> trainingData.cache()
> var regression = new LinearRegressionWithSGD().setIntercept(true)
> regression.optimizer.setStepSize(0.01)
> val model = regression.run(trainingData)
> model.save(sc, modelOutputPath)
>
> Post that when we score the model on the same data that it was trained on
> using the below snippet we see this -
>
> *Scoring - *
> val labeledPointsRDD = tableRecords.map(row =>
> {val filtered = row.toSeq.filter({ case s: String => false case _ => true
> })
> val converted = filtered.map({ case i: Int => i.toDouble case l: Long =>
> l.toDouble case d: Double => d case _ => 0.0 })
> val features = Vectors.dense(converted.toArray)
> (row(0), features)
> })
> val scaler1 = read(sc,scalarModelOutputPath)
> val normalizedData = labeledPointsRDD.map(p => (p._1,
> scaler1.transform(p._2)))
> normalizedData.cache()
> val model = LinearRegressionModel.load(sc,modelOutputPath)
> val valuesAndPreds = normalizedData.map(p => (p._1.toString(),
> model.predict(p._2)))
>
> However, a lot of predicted values are negative. The input data has no
> negative values we we are unable to understand this behaviour.
> Further the order and sequence of all the variables remains the same in
> the modelling and testing data frames.
>
> Any ideas?
>
> Thanks,
> Manish
>
>

Reply via email to