Hi,

We are experimenting scheduling errors due to mesos slave failing.
It seems to be an open bug, more information can be found here.

https://issues.apache.org/jira/browse/SPARK-3289

According to this  link
<https://mail-archives.apache.org/mod_mbox/mesos-user/201310.mbox/%3ccaakwvaxprrnrcdlazcybnmk1_9elyheodaf8urf8ssrlbac...@mail.gmail.com%3E>
  
from mail archive, it seems that Spark doesn't reschedule LOST tasks to
active executors, but keep trying rescheduling it on the failed host.

We would like to dynamically resize our Mesos cluster (adding or removing
machines - using an AWS autoscaling group), but this bug kills our running
applications if a Mesos slave running a Spark executor is shut down.

Is any known workaround?

Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Mesos-task-rescheduling-tp23740.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to