[
https://issues.apache.org/jira/browse/SPARK-16857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Claussen closed SPARK-16857.
---------------------------------
Usage error.
> CrossValidator and KMeans throws IllegalArgumentException
> ---------------------------------------------------------
>
> Key: SPARK-16857
> URL: https://issues.apache.org/jira/browse/SPARK-16857
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 1.6.1
> Environment: spark-jobserver docker image. Spark 1.6.1 on ubuntu,
> Hadoop 2.4
> Reporter: Ryan Claussen
>
> I am attempting to use CrossValidation to train KMeans model. When I attempt
> to fit the data spark throws an IllegalArgumentException as below since the
> KMeans algorithm outputs an Integer into the prediction column instead of a
> Double. Before I go too far: is using CrossValidation with Kmeans
> supported?
> Here's the exception:
> {quote}
> java.lang.IllegalArgumentException: requirement failed: Column prediction
> must be of type DoubleType but was actually IntegerType.
> at scala.Predef$.require(Predef.scala:233)
> at
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
> at
> org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator.evaluate(MulticlassClassificationEvaluator.scala:74)
> at
> org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:109)
> at
> org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:99)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> at org.apache.spark.ml.tuning.CrossValidator.fit(CrossValidator.scala:99)
> at
> com.ibm.bpm.cloud.ci.cto.prediction.SparkModelJob$.generateKMeans(SparkModelJob.scala:202)
> at
> com.ibm.bpm.cloud.ci.cto.prediction.SparkModelJob$.runJob(SparkModelJob.scala:62)
> at
> com.ibm.bpm.cloud.ci.cto.prediction.SparkModelJob$.runJob(SparkModelJob.scala:39)
> at
> spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:301)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> Here is the code I'm using to set up my cross validator. As the stack trace
> above indicates it is failing at the fit step when
> {quote}
> ...
> val mpc = new KMeans().setK(2).setFeaturesCol("indexedFeatures")
> val labelConverter = new
> IndexToString().setInputCol("prediction").setOutputCol("predictedLabel").setLabels(labelIndexer.labels)
> val pipeline = new Pipeline().setStages(Array(labelIndexer,
> featureIndexer, mpc, labelConverter))
> val evaluator = new
> MulticlassClassificationEvaluator().setLabelCol("approvedIndex").setPredictionCol("prediction")
> val paramGrid = new ParamGridBuilder().addGrid(mpc.maxIter, Array(100,
> 200, 500)).build()
> val cv = new
> CrossValidator().setEstimator(pipeline).setEvaluator(evaluator).setEstimatorParamMaps(paramGrid).setNumFolds(3)
> val cvModel = cv.fit(trainingData)
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]