Hey all, Thanks in advance. I ran into a situation where spark driver reduced the total executors count for my job even with dynamic allocation disabled, and caused the job to hang for ever.
Setup: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. All servers in cluster running Linux version 2.6.32. Job in yarn-client mode. Scenario: 1. Application running with required number of executors. 2. One of the DN's losses connectivity and is timed out. 2. Spark issues a killExecutor for the executor on the DN which was timed out. 3. Even with dynamic allocation off, spark's driver reduces the "targetNumExecutors". On analysing the code (Spark 1.3.1): When my DN goes unreachable: Spark core's HeartbeatReceiver invokes expireDeadHosts(): which checks if Dynamic Allocation is supported and then invokes "sc.killExecutor()" /if (sc.supportDynamicAllocation) { sc.killExecutor(executorId) }/ Surprisingly supportDynamicAllocation in sparkContext.scala is defined as, resulting "True" if dynamicAllocationTesting flag is enabled or spark is running over "yarn". /private[spark] def supportDynamicAllocation = master.contains("yarn") || dynamicAllocationTesting / "sc.killExecutor()" matches it to configured "schedulerBackend" (CoarseGrainedSchedulerBackend in this case) and invokes "killExecutors(executorIds)" CoarseGrainedSchedulerBackend calculates a "newTotal" for the total number of executors required, and sends a update to application master by invoking "doRequestTotalExecutors(newTotal)" CoarseGrainedSchedulerBackend then invokes a "doKillExecutors(filteredExecutorIds)" for the lost executors. Thus reducing the total number of executors in a host intermittently unreachable scenario. I noticed that this change to "CoarseGrainedSchedulerBackend" was introduced while fixing : https://issues.apache.org/jira/browse/SPARK-6325 <https://issues.apache.org/jira/browse/SPARK-6325> I am new to this code, If any of you could comment on why do we need "doRequestTotalExecutors" in "killExecutors" would be a great help. Also why do we have "supportDynamicAllocation" = True even if i have not enabled dynamic allocation. Regards, Prakhar. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-driver-reducing-total-executors-count-even-when-Dynamic-Allocation-is-disabled-tp14679.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org