Hi, I am trying to run LogisticRegressionWithSGD on RDD of LabeledPoints loaded using loadLibSVMFile:
val logistic: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "s3n://logistic-regression/epsilon_normalized") val model = LogisticRegressionWithSGD.train(logistic, 100) It gives an input validation error after about 10 minutes: org.apache.spark.SparkException: Input validation failed. at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:162) at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:146) at org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:157) at org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:192) >From reading this bug report ( https://issues.apache.org/jira/browse/SPARK-2575) since I am loading LibSVM format file there should be only 0/1 in the dataset and should not be facing the issue in the bug report. Is there something else I'm missing here? Thanks!