StefanRRichter opened a new pull request #6898: [FLINK-10431] Extraction of 
scheduling-related code from SlotPool into preliminary Scheduler
URL: https://github.com/apache/flink/pull/6898
 
 
   ## What is the purpose of the change
   
   This PR extracts the scheduling related code (e.g. slot sharing logic) from 
to slot pool into a preliminary version of a future scheduler component. Our 
primary goal is fixing the scheduling logic for local recovery. Changes in this 
PR open up potential for more code cleanups (e.g. removing all scheduling 
concerns from the slot pool, removing `ProviderAndOwner`, moving away from some 
`CompletableFuture` return types, etc). This cleanup and some test rewrites 
will happen in a followup PR.
   
   ## Brief change log
   
   - SlotPool is no longer a `RpcEndpoint`, we need to take care that all state 
modification happens in the component's main thread now.
   - Introduced `SlotInfo` and moving the slot sharing code into a scheduler 
component. Slot pool code can now deal with single slot requests. The pattern 
of interaction is more explicit, we have 3 main new methods: 
`getAvailableSlotsInformation` to list available slots, `allocateAvailableSlot` 
to allocated a listed / available slot, `requestNewAllocatedSlot` to request a 
new slot from the resoure manager. The old codepaths currently still co-exist 
in the slot pool and will be removed in followup work.
   - Introduce creating a collection of all previous allocations through 
`ExecutionGraph::computeAllPriorAllocationIds`. This serves as basis to compute 
a "blacklist" of allocation ids that we use to fix the scheduling of local 
recovery.
   - Provide an improved version of the scheduling for local recovery, that 
uses a blacklist.
   
   
   ## Verifying this change
   
   
   This change is already covered by existing tests, but we still need to 
rewrite tests for the slot pool and add more additional tests in followup work.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to