Hi Experts:
Seems like a bug in ExecutorAllocationManager, because
numberMaxNeededExecutors value is negative in JMX exporter, which is
unreasonable
Spark Version is 3.1.1
And the following is the debug log, the numRunningOrPendingTasks get a negative
value:
22/01/13 09:32:21 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -136, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -171, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -188, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -196, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -209, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -221, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -229, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -234, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -238, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -240, tasksperexecutor: 4
22/01/13 09:32:22 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -246, tasksperexecutor: 4
22/01/13 09:32:23 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -250, tasksperexecutor: 4
22/01/13 09:32:23 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -250, tasksperexecutor: 4
22/01/13 09:32:23 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -250, tasksperexecutor: 4
22/01/13 09:32:23 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -251, tasksperexecutor: 4
22/01/13 09:32:23 DEBUG ExecutorAllocationManager: max needed for rpId: 0
numpending: -211, tasksperexecutor: 4
The return value of maxNumExecutorsNeededPerResourceProfile method turns
to be uncorrect after some hours running in thrift server.
Best
xu