Hi,

I am trying to run  LogisticRegressionWithSGD on RDD of LabeledPoints
loaded using loadLibSVMFile:

val logistic: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,
"s3n://logistic-regression/epsilon_normalized")

val model = LogisticRegressionWithSGD.train(logistic, 100)

It gives an input validation error after about 10 minutes:

org.apache.spark.SparkException: Input validation failed.
    at
org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:162)
    at
org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:146)
    at
org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:157)
    at
org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:192)

>From reading this bug report (
https://issues.apache.org/jira/browse/SPARK-2575) since I am loading LibSVM
format file there should be only 0/1 in the dataset and should not be
facing the issue in the bug report. Is there something else I'm missing
here?

Thanks!

Reply via email to