Hi,

I want to use MultipleLinearRegression, but I got really strange results.
So I tested it with the housing price dataset:
http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data

And here I get negative house prices - even when I use the training set as
dataset:
LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0, 2978.0,
1369.0, 1451.0))
LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0, 4038.0,
4223.0, 4868.0))
LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0, 4351.0,
4129.0, 4617.0))
LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0, 1992.0,
2008.0, 2504.0))
LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0, 1983.0,
2300.0, 3811.0))
LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0, 1965.0,
2425.0, 3178.0))
...

and a huge squared error:
Squared error: 4.799184832395361E159

You can find my code here:
https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala

Can you help me? What did I do wrong?

Thank you for your help,
Felix

Reply via email to