This is a deliberate killing request by heartbeat mechanism, have nothing to do with dynamic allocation. Here because you're running on yarn mode, so "supportDynamicAllocation" will be true, but actually there's no relation to dynamic allocation.
>From my understanding "doRequestTotalExecutors" is to sync the current total executor number with AM, AM will try to cancel some pending container requests when current expected executor number is less. The actual container killing command is issued by "doRequestTotalExecutors". Not sure what is your actual problem? is it unexpected? Thanks Saisai On Mon, Oct 19, 2015 at 3:51 PM, prakhar jauhari <prak...@gmail.com> wrote: > Hey all, > > Thanks in advance. I ran into a situation where spark driver reduced the > total executors count for my job even with dynamic allocation disabled, and > caused the job to hang for ever. > > Setup: > Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. > All servers in cluster running Linux version 2.6.32. > Job in yarn-client mode. > > Scenario: > 1. Application running with required number of executors. > 2. One of the DN's losses connectivity and is timed out. > 2. Spark issues a killExecutor for the executor on the DN which was timed > out. > 3. Even with dynamic allocation off, spark's driver reduces the > "targetNumExecutors". > > On analysing the code (Spark 1.3.1): > > When my DN goes unreachable: > > Spark core's HeartbeatReceiver invokes expireDeadHosts(): which checks if > Dynamic Allocation is supported and then invokes "sc.killExecutor()" > > /if (sc.supportDynamicAllocation) { > sc.killExecutor(executorId) > }/ > > Surprisingly supportDynamicAllocation in sparkContext.scala is defined as, > resulting "True" if dynamicAllocationTesting flag is enabled or spark is > running over "yarn". > > /private[spark] def supportDynamicAllocation = > master.contains("yarn") || dynamicAllocationTesting / > > "sc.killExecutor()" matches it to configured "schedulerBackend" > (CoarseGrainedSchedulerBackend in this case) and invokes > "killExecutors(executorIds)" > > CoarseGrainedSchedulerBackend calculates a "newTotal" for the total number > of executors required, and sends a update to application master by invoking > "doRequestTotalExecutors(newTotal)" > > CoarseGrainedSchedulerBackend then invokes a > "doKillExecutors(filteredExecutorIds)" for the lost executors. > > Thus reducing the total number of executors in a host intermittently > unreachable scenario. > > > I noticed that this change to "CoarseGrainedSchedulerBackend" was > introduced > while fixing : https://issues.apache.org/jira/browse/SPARK-6325 > <https://issues.apache.org/jira/browse/SPARK-6325> > > > > I am new to this code, If any of you could comment on why do we need > "doRequestTotalExecutors" in "killExecutors" would be a great help. Also > why > do we have "supportDynamicAllocation" = True even if i have not enabled > dynamic allocation. > > Regards, > Prakhar. > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-driver-reducing-total-executors-count-even-when-Dynamic-Allocation-is-disabled-tp14679.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >