We are running Spark and Spark Streaming on Mesos (with multiple masters for
HA).
At launch, our Spark jobs successfully look up the current Mesos master from
zookeeper and spawn tasks.

However, when the Mesos master changes while the spark job is executing, the
spark driver seems to interact with the old Mesos master, and therefore
fails to launch any new tasks.
We are running long running Spark streaming jobs, so we have temporarily
switched to coarse grained as a work around, but it prevents us from running
in fine grained mode which we would prefer for some job.

Looking at the code for MesosSchedulerBackend, it it has an empty
implementation of the reregistered (and disconnected) methods, which I
believe would be called when the master changes:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L202

http://mesos.apache.org/documentation/latest/app-framework-development-guide/

Are there any plans to implement master reregistration in the Spark
framework, or does anyone have any suggested workarounds for long running
jobs to deal with the mesos master changing?  (Or is there something we are
doing wrong?)

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Framework-handling-of-Mesos-master-change-tp21107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to