[ https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374778#comment-16374778 ]
Stavros Kontopoulos edited comment on SPARK-23485 at 2/23/18 6:36 PM: ---------------------------------------------------------------------- How about locality preferences + a hardware problem, like the disk problem? I see code in Spark Kubernetes scheduler related to locality (not sure if it is completed). Will that problem be detected and will kubernetes scheduler consider the node as problematic? If so then I guess there is no need for blacklisting in such scenarios. If though, this cannot be detected and the task is failing but there is locality preference what will happen? Kubernetes should not just re-run things elsewhere just because there was a failure. The reason for a failure matters. Is that an app failure or something lower level. was (Author: skonto): How about locality preferences + a hardware problem, like the disk problem? I see code in Spark Kubernetes scheduler related to locality (not sure if it is completed). Will that problem be detected and will kubernetes scheduler consider the node as problematic? If so then I guess there is no need for blacklisting. > Kubernetes should support node blacklist > ---------------------------------------- > > Key: SPARK-23485 > URL: https://issues.apache.org/jira/browse/SPARK-23485 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Scheduler > Affects Versions: 2.3.0 > Reporter: Imran Rashid > Priority: Major > > Spark's BlacklistTracker maintains a list of "bad nodes" which it will not > use for running tasks (eg., because of bad hardware). When running in yarn, > this blacklist is used to avoid ever allocating resources on blacklisted > nodes: > https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128 > I'm just beginning to poke around the kubernetes code, so apologies if this > is incorrect -- but I didn't see any references to > {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it > seems this is missing. Thought of this while looking at SPARK-19755, a > similar issue on mesos. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org