An RDD can hold objects of any type.  If you generally think of it as a
distributed Collection, then you won't ever be that far off.

As far as serialization, the contents of an RDD must be serializable.
 There are two serialization libraries you can use with Spark: normal Java
serialization or Kryo serialization.  See
https://spark.apache.org/docs/latest/tuning.html#data-serialization for
more details.

If you are using Java serialization then just implementing the Serializable
interface will work.  If you're using Kryo, then

The point that it works fine with local mode and tests but fails in Mesos,
that makes me think there's an issue with the Mesos cluster deployment.
 First, does it work properly in standalone mode?  Second, how are you
getting the Clojure libraries onto the Mesos executors?  Are they included
in your executor URI bundle, or otherwise passing a parameter that points
to the clojure jars?

Cheers,
Andrew


On Thu, May 8, 2014 at 9:55 AM, Soren Macbeth <so...@yieldbot.com> wrote:

> Hi,
>
> What are the requirements of objects that are stored in RDDs?
>
> I'm still struggling with an exception I've already posted about several
> times. My questions are:
>
> 1) What interfaces are objects stored in RDDs expected to implement, if
> any?
> 2) Are collections (be they scala, java or otherwise) handled differently
> than other objects?
>
> The bug I'm hitting is when I try to use my clojure DSL (which wraps the
> java api) with clojure collections, specifically
> clojure.lang.PersistentVectors in my RDDs. Here is the exception message:
>
> org.apache.spark.SparkException: Job aborted: Exception while deserializing
> and fetching task: com.esotericsoftware.kryo.KryoException:
> java.lang.IllegalArgumentException: Can not set final scala.collecti
> on.convert.Wrappers field
> scala.collection.convert.Wrappers$SeqWrapper.$outer to
> clojure.lang.PersistentVector
>
> Now, this same application works fine in local mode and tests, but it fails
> when run under mesos. That would seem to me to point to something around
> RDD partitioning for tasks, but I'm not sure.
>
> I don't know much scala, but according to google, SeqWrapper is part of the
> implicit JavaConversion functionality of scala collections. Under what
> circumstances would spark be trying to wrap my RDD objects in scala
> collections?
>
> Finally - I'd like to point out that this is not a serialization issue with
> my clojure collection objects. I have registered serializers for them and
> have verified they serialize and deserialize perfectly well in spark.
>
> One last note is that this failure occurs after all the tasks for finished
> for a reduce stage and the results are returned to the driver.
>
> TIA
>

Reply via email to