Hi Trevor, the multiple linear regression implementation is quite sensitive to the initial learning rate. If the value is not set right, it might be the case that the algorithm alternates between ever increasing values left and right of the minimum. Could you try to set a smaller initial learning rate? If the error should still persist, then we should file a JIRA issue for that.
Cheers, Till On Tue, May 3, 2016 at 4:58 PM, Trevor Grant <trevor.d.gr...@gmail.com> wrote: > Is it just me or does MSE tend to increase with more iterations of Linear > Regression? > > Using 1.0.2 (or 1.1) > > %flink > import org.apache.flink.ml.optimization.SimpleGradientDescent > import org.apache.flink.ml.optimization.LearningRateMethod > import org.apache.flink.ml.regression.MultipleLinearRegression > import org.apache.flink.ml.common.LabeledVector > import org.apache.flink.ml.math.DenseVector > > val survival = env.readCsvFile[(String, String, String, > String)]("file:///home/trevor/gits/datasets/haberman/haberman.data") > val survivalLV = survival > .map{tuple => > val list = tuple.productIterator.toList > val numList = list.map(_.asInstanceOf[String].toDouble) > LabeledVector(numList(3), DenseVector(numList.take(3).toArray)) > } > > > val mlr_default = MultipleLinearRegression() > .setIterations(5) > > > mlr_default.fit(survivalLV) > > val mse1 = mlr_default.squaredResidualSum(survivalLV).collect() > > > val mlr_default = MultipleLinearRegression() > .setIterations(10) > > mlr_default.fit(survivalLV) > > val mse2 = mlr_default.squaredResidualSum(survivalLV).collect() > println(mse1 , mse2 ) > > > Results in : > > (Buffer(4.047910100612734E28),Buffer(2.6223205846507677E52)) > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* >