If I enable dynamicAllocation and then use spark-shell or pyspark,
things start out working as expected: running simple commands causes new
executors to start and complete tasks. If the shell is left idle for a
while, executors start getting killed off:
15/04/23 10:52:43 INFO cluster.YarnClientSchedulerBackend: Requesting to kill
executor(s) 368
15/04/23 10:52:43 INFO spark.ExecutorAllocationManager: Removing executor 368
because it has been idle for 600 seconds (new desired total will be 665)
That makes sense. But the action also results in error messages:
15/04/23 10:52:47 ERROR cluster.YarnScheduler: Lost executor 368 on hostname:
remote Akka client disassociated
15/04/23 10:52:47 INFO scheduler.DAGScheduler: Executor lost: 368 (epoch 0)
15/04/23 10:52:47 INFO spark.ExecutorAllocationManager: Existing executor 368
has been removed (new total is 665)
15/04/23 10:52:47 INFO storage.BlockManagerMasterActor: Trying to remove
executor 368 from BlockManagerMaster.
15/04/23 10:52:47 INFO storage.BlockManagerMasterActor: Removing block manager
BlockManagerId(368, hostname, 35877)
15/04/23 10:52:47 INFO storage.BlockManagerMaster: Removed 368 successfully in
removeExecutor
After that, trying to run a simple command results in:
15/04/23 10:13:30 ERROR util.Utils: Uncaught exception in thread
spark-dynamic-executor-allocation-0
java.lang.IllegalArgumentException: Attempted to request a negative number of
executor(s) -663 from the cluster manager. Please specify a positive number!
And then only the single remaining executor attempts to complete the new
tasks. Am I missing some kind of simple configuration item, are other
people seeing the same behavior as a bug, or is this actually expected?
Mike Stone
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org