Hi,

What are the requirements of objects that are stored in RDDs?

I'm still struggling with an exception I've already posted about several
times. My questions are:

1) What interfaces are objects stored in RDDs expected to implement, if any?
2) Are collections (be they scala, java or otherwise) handled differently
than other objects?

The bug I'm hitting is when I try to use my clojure DSL (which wraps the
java api) with clojure collections, specifically
clojure.lang.PersistentVectors in my RDDs. Here is the exception message:

org.apache.spark.SparkException: Job aborted: Exception while deserializing
and fetching task: com.esotericsoftware.kryo.KryoException:
java.lang.IllegalArgumentException: Can not set final scala.collecti
on.convert.Wrappers field
scala.collection.convert.Wrappers$SeqWrapper.$outer to
clojure.lang.PersistentVector

Now, this same application works fine in local mode and tests, but it fails
when run under mesos. That would seem to me to point to something around
RDD partitioning for tasks, but I'm not sure.

I don't know much scala, but according to google, SeqWrapper is part of the
implicit JavaConversion functionality of scala collections. Under what
circumstances would spark be trying to wrap my RDD objects in scala
collections?

Finally - I'd like to point out that this is not a serialization issue with
my clojure collection objects. I have registered serializers for them and
have verified they serialize and deserialize perfectly well in spark.

One last note is that this failure occurs after all the tasks for finished
for a reduce stage and the results are returned to the driver.

TIA

Reply via email to