StefanRRichter opened a new pull request #6898: [FLINK-10431] Extraction of scheduling-related code from SlotPool into preliminary Scheduler URL: https://github.com/apache/flink/pull/6898 ## What is the purpose of the change This PR extracts the scheduling related code (e.g. slot sharing logic) from to slot pool into a preliminary version of a future scheduler component. Our primary goal is fixing the scheduling logic for local recovery. Changes in this PR open up potential for more code cleanups (e.g. removing all scheduling concerns from the slot pool, removing `ProviderAndOwner`, moving away from some `CompletableFuture` return types, etc). This cleanup and some test rewrites will happen in a followup PR. ## Brief change log - SlotPool is no longer a `RpcEndpoint`, we need to take care that all state modification happens in the component's main thread now. - Introduced `SlotInfo` and moving the slot sharing code into a scheduler component. Slot pool code can now deal with single slot requests. The pattern of interaction is more explicit, we have 3 main new methods: `getAvailableSlotsInformation` to list available slots, `allocateAvailableSlot` to allocated a listed / available slot, `requestNewAllocatedSlot` to request a new slot from the resoure manager. The old codepaths currently still co-exist in the slot pool and will be removed in followup work. - Introduce creating a collection of all previous allocations through `ExecutionGraph::computeAllPriorAllocationIds`. This serves as basis to compute a "blacklist" of allocation ids that we use to fix the scheduling of local recovery. - Provide an improved version of the scheduling for local recovery, that uses a blacklist. ## Verifying this change This change is already covered by existing tests, but we still need to rewrite tests for the slot pool and add more additional tests in followup work. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services