xintongsong commented on a change in pull request #11615: [FLINK-16605] Add max limitation to the total number of slots URL: https://github.com/apache/flink/pull/11615#discussion_r406571178
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManagerImpl.java ########## @@ -375,6 +375,12 @@ public void registerTaskManager(final TaskExecutorConnection taskExecutorConnect if (taskManagerRegistrations.containsKey(taskExecutorConnection.getInstanceID())) { reportSlotStatus(taskExecutorConnection.getInstanceID(), initialSlotReport); } else { + if (getNumberRegisteredSlots() + Math.max(getNumberPendingTaskManagerSlots(), numSlotsPerWorker) > maxSlotNum) { + LOG.warn("The total number of slots exceeds the max limitation, release the excess resource."); + resourceActions.releaseResource(taskExecutorConnection.getInstanceID(), new FlinkException("The total number of slots exceeds the max limitation.")); + return; + } Review comment: It seems to me that this approach implicitly assumes that `releaseResource` returns false means running on a standalone cluster, which is not always true. Without this assumption, continuing registering the slot when the release action failed seems quite against intuition. Instead of let the slot manager behaves differently according to the release action result, I would suggest to simply pass in infinite large values for the max limits when creating the slot manager. In this way, slot manager should not need to behave differently on standalone and active resource managers. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services