Hi, What are the requirements of objects that are stored in RDDs?
I'm still struggling with an exception I've already posted about several times. My questions are: 1) What interfaces are objects stored in RDDs expected to implement, if any? 2) Are collections (be they scala, java or otherwise) handled differently than other objects? The bug I'm hitting is when I try to use my clojure DSL (which wraps the java api) with clojure collections, specifically clojure.lang.PersistentVectors in my RDDs. Here is the exception message: org.apache.spark.SparkException: Job aborted: Exception while deserializing and fetching task: com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set final scala.collecti on.convert.Wrappers field scala.collection.convert.Wrappers$SeqWrapper.$outer to clojure.lang.PersistentVector Now, this same application works fine in local mode and tests, but it fails when run under mesos. That would seem to me to point to something around RDD partitioning for tasks, but I'm not sure. I don't know much scala, but according to google, SeqWrapper is part of the implicit JavaConversion functionality of scala collections. Under what circumstances would spark be trying to wrap my RDD objects in scala collections? Finally - I'd like to point out that this is not a serialization issue with my clojure collection objects. I have registered serializers for them and have verified they serialize and deserialize perfectly well in spark. One last note is that this failure occurs after all the tasks for finished for a reduce stage and the results are returned to the driver. TIA