tillrohrmann opened a new pull request #9928: [FLINK-12122] Add support for spreading slots out across all TaskExecutors URL: https://github.com/apache/flink/pull/9928 ## What is the purpose of the change This PR adds support for spreading slots out across all currently registered `TaskExecutors`. This feature can be enabled by setting `cluster.evenly-spread-out-slots: true` in Flink's configuration file. If the feature is enabled, then Flink's `ResourceManager` tries to fulfill slot requests with slots whose `TaskExecutor's` utilization is minimal if there are multiple slots matching the resource requirements. The `TaskExecutor's` utilization is calculated by `usedSlots / totalSlots`. On the `JobMaster` side, the `SlotPool` tries to do the same wrt to the available slots being allocated for this job. This means that the `SlotPool` always picks the slot with the least utilization if there are multiple slots fulfilling the resource requirements. Here the `TaskExecutor's` utilization is calculates by `usedSlotsByJob / totalSlotsOfferedToJob`. ## Brief change log * 5f74a8f: Introduce SlotMatchingStrategy for SlotManager The SlotMatchingStrategy encapsulates how the SlotManager finds a matching slot for a slot request. At the moment, the only implementation AnyMatchingSlotMatchingStrategy picks any matching slot. * 752462a: Add LeastUtilizationSlotMatchingStrategy for spreading slot allocations out The LeastUtilizationSlotMatchingStrategy picks the matching slots which belongs to a TaskExecutor with the least utilization value. That way the SlotManager will spread out slot allocations across all available/registered TaskExecutors. * 5511ada: [FLINK-12122] Introduce ClusterOptions#EVENLY_SPREAD_OUT_SLOTS_STRATEGY Add config option to enable to evenly spread out slots across all available TaskExecutors. * bbb0723: Calculate TaskExecutorUtilization when listing available slots When listing available slots stored in the SlotPool and the SlotSharingManager, the system will now also calculate the utilization of the owning TaskExecutor wrt the job. * 268e688: Add EvenlySpreadOutLocationPreferenceSlotSelectionStrategy The EvenlySpreadOutLocationPreferenceSlotSelectionStrategy is a special implementation of the LocationPreferenceSlotSelectionStrategy which tries to evenly spread out the workload across all TaskExecutors by choosing the slot with the least utilization if there is a tie wrt the locality. * 86b81f9: Choose SlotSelectionStrategy based on ClusterOptions#EVENLY_SPREAD_OUT_SLOTS_STRATEGY If ClusterOptions#EVENLY_SPREAD_OUT_SLOTS_STRATEGY is enabled, then Flink will use the evenly spread out location preference strategy to spread out the workload as much as possible. ## Verifying this change - Added tests: `SlotPoolSlotSpreadOutTest`, `AnyMatchingSlotMatchingStrategyTest`, `LeastUtilizationSlotMatchingStrategyTest`, `SlotPoolImplTest#testCalculationOfTaskExecutorUtilization`, `SlotSharingManagerTest#testTaskExecutorUtilizationCalculation`, `SlotManagerImplTest#testSpreadOutSlotAllocationStrategy` - Tried out manually ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (docs)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services