If I enable dynamicAllocation and then use spark-shell or pyspark, things start out working as expected: running simple commands causes new executors to start and complete tasks. If the shell is left idle for a while, executors start getting killed off:

15/04/23 10:52:43 INFO cluster.YarnClientSchedulerBackend: Requesting to kill 
executor(s) 368
15/04/23 10:52:43 INFO spark.ExecutorAllocationManager: Removing executor 368 
because it has been idle for 600 seconds (new desired total will be 665)

That makes sense. But the action also results in error messages:

15/04/23 10:52:47 ERROR cluster.YarnScheduler: Lost executor 368 on hostname: 
remote Akka client disassociated
15/04/23 10:52:47 INFO scheduler.DAGScheduler: Executor lost: 368 (epoch 0)
15/04/23 10:52:47 INFO spark.ExecutorAllocationManager: Existing executor 368 
has been removed (new total is 665)
15/04/23 10:52:47 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 368 from BlockManagerMaster.
15/04/23 10:52:47 INFO storage.BlockManagerMasterActor: Removing block manager 
BlockManagerId(368, hostname, 35877)
15/04/23 10:52:47 INFO storage.BlockManagerMaster: Removed 368 successfully in 
removeExecutor

After that, trying to run a simple command results in:

15/04/23 10:13:30 ERROR util.Utils: Uncaught exception in thread 
spark-dynamic-executor-allocation-0
java.lang.IllegalArgumentException: Attempted to request a negative number of 
executor(s) -663 from the cluster manager. Please specify a positive number!

And then only the single remaining executor attempts to complete the new tasks. Am I missing some kind of simple configuration item, are other people seeing the same behavior as a bug, or is this actually expected?

Mike Stone

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to