We are running Spark and Spark Streaming on Mesos (with multiple masters for HA). At launch, our Spark jobs successfully look up the current Mesos master from zookeeper and spawn tasks.
However, when the Mesos master changes while the spark job is executing, the spark driver seems to interact with the old Mesos master, and therefore fails to launch any new tasks. We are running long running Spark streaming jobs, so we have temporarily switched to coarse grained as a work around, but it prevents us from running in fine grained mode which we would prefer for some job. Looking at the code for MesosSchedulerBackend, it it has an empty implementation of the reregistered (and disconnected) methods, which I believe would be called when the master changes: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L202 http://mesos.apache.org/documentation/latest/app-framework-development-guide/ Are there any plans to implement master reregistration in the Spark framework, or does anyone have any suggested workarounds for long running jobs to deal with the mesos master changing? (Or is there something we are doing wrong?) Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Framework-handling-of-Mesos-master-change-tp21107.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org