RocMarshal commented on code in PR #25647: URL: https://github.com/apache/flink/pull/25647#discussion_r1852202449
########## flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/SlotSharingExecutionSlotAllocatorFactory.java: ########## @@ -40,13 +41,16 @@ public SlotSharingExecutionSlotAllocatorFactory( PhysicalSlotProvider slotProvider, boolean slotWillBeOccupiedIndefinitely, PhysicalSlotRequestBulkChecker bulkChecker, - Duration allocationTimeout) { + Duration allocationTimeout, + TaskManagerOptions.TaskManagerLoadBalanceMode taskManagerLoadBalanceMode) { this( slotProvider, slotWillBeOccupiedIndefinitely, bulkChecker, allocationTimeout, - new LocalInputPreferredSlotSharingStrategy.Factory()); + taskManagerLoadBalanceMode == TaskManagerOptions.TaskManagerLoadBalanceMode.TASKS + ? new TaskBalancedPreferredSlotSharingStrategy.Factory() Review Comment: In fact, the phenomenon of uneven task scheduling results is mainly caused by two stages: 1. tasks->slot 2. slots->taskmanager Because prior to the initial feature freeze, we anticipated that the entire FLIP would be at risk. Our original intention in initiating this PR was to release this change as early as possible to address the imbalance issue in the task ->slot dimension. But now there is no time risk with this feature, so we plan to release both dimensions of functionality together. BTW, the optimizations and changes mentioned above are highly likely not to be fixed in the current PR as it may be abandoned. But we will place the corresponding optimization in the appropriate PR position in the future. If you are willing to help with the review, I would greatly appreciate it! :) ########## docs/layouts/shortcodes/generated/all_taskmanager_section.html: ########## @@ -90,7 +90,7 @@ <td><h5>taskmanager.load-balance.mode</h5></td> <td style="word-wrap: break-word;">NONE</td> <td><p>Enum</p></td> - <td>Mode for the load-balance allocation strategy across all available <code class="highlighter-rouge">TaskManagers</code>.<ul><li>The <code class="highlighter-rouge">SLOTS</code> mode tries to spread out the slots evenly across all available <code class="highlighter-rouge">TaskManagers</code>.</li><li>The <code class="highlighter-rouge">NONE</code> mode is the default mode without any specified strategy.</li></ul><br /><br />Possible values:<ul><li>"NONE"</li><li>"SLOTS"</li></ul></td> + <td>Mode for the load-balance allocation strategy across all available <code class="highlighter-rouge">TaskManagers</code>.<ul><li>The <code class="highlighter-rouge">SLOTS</code> mode tries to spread out the slots evenly across all available <code class="highlighter-rouge">TaskManagers</code>.</li><li>The <code class="highlighter-rouge">TASKS</code> mode tries to schedule evenly all tasks based on its' number across all available <code class="highlighter-rouge">TaskManagers</code>. Note: Currently, enabling this parameter can only achieve the balancing effect of the slot level dimension of <code class="highlighter-rouge">DefaultScheduler</code>.</li><li>The <code class="highlighter-rouge">NONE</code> mode is the default mode without any specified strategy.</li></ul><br /><br />Possible values:<ul><li>"NONE"</li><li>"SLOTS"</li><li>"TASKS"</li></ul></td> Review Comment: Good idea~, we do need a more detailed and precise description, as well as traceable design documents. How about making the following changes based on your suggestion? 1. Optimize the configuration description section in the code, including adding FLIP addresses 2. The explanation of how to use and the differences between it and other configurations, as well as the creation of charts, can be placed in https://issues.apache.org/jira/browse/FLINK-33392 Completed in the middle ########## flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java: ########## @@ -707,6 +707,12 @@ public class TaskManagerOptions { "The %s mode tries to spread out the slots evenly across all available %s.", code(TaskManagerLoadBalanceMode.SLOTS.name()), code("TaskManagers")), + text( + "The %s mode tries to schedule evenly all tasks based on its' number across all available %s. " Review Comment: The meaning expressed here is that the scheduler will try its best to ensure that the scheduling results are concentrated and the number of tasks on each Taskmanager is similar. I will add a more detailed description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org