[ https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374599#comment-16374599 ]
Imran Rashid commented on SPARK-23485: -------------------------------------- Yeah I don't think its safe to assume that its kubernetes responsibility to entirely figure out the equivalent of a spark application's internal blacklist. You can't guarantee that it'll detect hardware issues, and it also might be an issue which is specific to the spark application (eg. a missing jar). Yarn has some basic detection of bad nodes as well, but we observed cases in production where one bad disk would effectively take out an entire application on a large cluster without spark's blacklisting, as you could have many task failures pile up very quickly. That said, the existing blacklist implementation in spark already handles that case, even without the extra handling I'm proposing here. The spark app would still have its own node blacklist, and would avoid scheduling tasks on that node. However, this is suboptimal because spark isn't really getting as many resources as it should. Eg., it would request 10 executors, kubernetes hands it 10, but really spark can only use 8 of them because 2 live on a node that is blacklisted. I don't think this can be directly handled with taints, if I understand correctly. I assume applying a taint is an admin level thing? that would mean a spark app couldn't dynamically apply a taint when it discovers a problem on a node (and really, it probably shouldn't be able to, as it shouldn't trust an arbitrary user). Furthermore, it doesn't allow it to be application specific -- blacklisting is really just a heuristic, and you probably do not want it to be applied across applications. Its not clear what you'd do with multiple apps each with their own blacklist, as nodes go into the blacklist and then move out of the blacklist at different times from each app. > Kubernetes should support node blacklist > ---------------------------------------- > > Key: SPARK-23485 > URL: https://issues.apache.org/jira/browse/SPARK-23485 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Scheduler > Affects Versions: 2.3.0 > Reporter: Imran Rashid > Priority: Major > > Spark's BlacklistTracker maintains a list of "bad nodes" which it will not > use for running tasks (eg., because of bad hardware). When running in yarn, > this blacklist is used to avoid ever allocating resources on blacklisted > nodes: > https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128 > I'm just beginning to poke around the kubernetes code, so apologies if this > is incorrect -- but I didn't see any references to > {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it > seems this is missing. Thought of this while looking at SPARK-19755, a > similar issue on mesos. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org