Hi, We are experimenting scheduling errors due to mesos slave failing. It seems to be an open bug, more information can be found here.
https://issues.apache.org/jira/browse/SPARK-3289 According to this link <https://mail-archives.apache.org/mod_mbox/mesos-user/201310.mbox/%3ccaakwvaxprrnrcdlazcybnmk1_9elyheodaf8urf8ssrlbac...@mail.gmail.com%3E> from mail archive, it seems that Spark doesn't reschedule LOST tasks to active executors, but keep trying rescheduling it on the failed host. We would like to dynamically resize our Mesos cluster (adding or removing machines - using an AWS autoscaling group), but this bug kills our running applications if a Mesos slave running a Spark executor is shut down. Is any known workaround? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Mesos-task-rescheduling-tp23740.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
