Imran Rashid created SPARK-16554: ------------------------------------ Summary: Spark should kill executors when they are blacklisted Key: SPARK-16554 URL: https://issues.apache.org/jira/browse/SPARK-16554 Project: Spark Issue Type: New Feature Components: Scheduler Reporter: Imran Rashid Assignee: Imran Rashid
SPARK-8425 will allow blacklisting faulty executors and nodes. However, these blacklisted executors will continue to run. This is bad for a few reasons: (1) Even if there is faulty-hardware, if the cluster is under-utilized spark may be able to request another executor on a different node. (2) If there is a faulty-disk (the most common case of faulty-hardware), the cluster manager may be able to allocate another executor on the same node, if it can exclude the bad disk. (Yarn will do this with its disk-health checker.) With dynamic allocation, this is less critical, as a blacklisted executor will stop running new tasks and eventually get reclaimed. Users may not *always* want to kill bad executors, so this must be configurable to some extent. At a minimum, it should be possible to enable / disable it; perhaps the executor should be killed after it has been blacklisted a configurable {{N}} times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org