Hi Tobias,

I was curious about this issue and tried to run your example on my local
Mesos. I was able to reproduce your issue using your current config:

[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
1.0:4 failed 4 times (most recent failure: Exception failure:
java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2)
org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times
(most recent failure: Exception failure: java.lang.ClassNotFoundException:
spark.SparkExamplesMinimal$$anonfun$2)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)

Creating a simple jar from the job and providing it through the
configuration seems to solve it:

val conf = new SparkConf()
      .setMaster("mesos://<my_ip>:5050/")
*
.setJars(Seq("/sparkexample/target/scala-2.10/sparkexample_2.10-0.1.jar"))*
      .setAppName("SparkExamplesMinimal")

Resulting in:
14/05/21 12:03:45 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1)
14/05/21 12:03:45 INFO scheduler.DAGScheduler: Stage 1 (count at
SparkExamplesMinimal.scala:50) finished in 1.120 s
14/05/21 12:03:45 INFO spark.SparkContext: Job finished: count at
SparkExamplesMinimal.scala:50, took 1.177091435 s
count: 1000000

Why the closure serialization does not work with Mesos is beyond my current
knowledge.
Would be great to hear from the experts (cross-posting to dev for that)

-kr, Gerard.













On Wed, May 21, 2014 at 11:51 AM, Tobias Pfeiffer <t...@preferred.jp> wrote:

> Hi,
>
> I have set up a cluster with Mesos (backed by Zookeeper) with three
> master and three slave instances. I set up Spark (git HEAD) for use
> with Mesos according to this manual:
> http://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html
>
> Using the spark-shell, I can connect to this cluster and do simple RDD
> operations, but the same code in a Scala class and executed via sbt
> run-main works only partially. (That is, count() works, count() after
> flatMap() does not.)
>
> Here is my code: https://gist.github.com/tgpfeiffer/7d20a4d59ee6e0088f91
> The file SparkExamplesScript.scala, when pasted into spark-shell,
> outputs the correct count() for the parallelized list comprehension,
> as well as for the flatMapped RDD.
>
> The file SparkExamplesMinimal.scala contains exactly the same code,
> and also the MASTER configuration and the Spark Executor are the same.
> However, while the count() for the parallelized list is displayed
> correctly, I receive the following error when asking for the count()
> of the flatMapped RDD:
>
> -----------------
>
> 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting Stage 1
> (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34), which
> has no missing parents
> 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting 8 missing
> tasks from Stage 1 (FlatMappedRDD[1] at flatMap at
> SparkExamplesMinimal.scala:34)
> 14/05/21 09:47:49 INFO scheduler.TaskSchedulerImpl: Adding task set
> 1.0 with 8 tasks
> 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Starting task 1.0:0
> as TID 8 on executor 20140520-102159-2154735808-5050-1108-1: mesos9-1
> (PROCESS_LOCAL)
> 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Serialized task 1.0:0
> as 1779147 bytes in 37 ms
> 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Lost TID 8 (task 1.0:0)
> 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException
> java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
> at
> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
> at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
> at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> -----------------
>
> Can anyone explain to me where this comes from or how I might further
> track the problem down?
>
> Thanks,
> Tobias
>

Reply via email to