Hi,
I've implemented class MyClass in MLlib that does some operation on
LabeledPoint. MyClass extends serializable, so I can map this operation on data
of RDD[LabeledPoints], such as data.map(lp => MyClass.operate(lp)). I write
this class in file with ObjectOutputStream.writeObject. Then I stop and restart
Spark. I load this class from file with
ObjectInputStream.readObject.asInstanceOf[MyClass]. When I try to map the same
operation of this class to RDD, Spark throws not serializable exception:
org.apache.spark.SparkException: Task not serializable
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1453)
at org.apache.spark.rdd.RDD.map(RDD.scala:273)
Could you suggest why it throws this exception while MyClass is serializable by
definition?
Best regards, Alexander