[GitHub] [flink] zentol edited a comment on pull request #13464: [FLINK-19307][coordination] Add ResourceTracker

GitBox Thu, 24 Sep 2020 03:23:29 -0700


zentol edited a comment on pull request #13464:
URL: https://github.com/apache/flink/pull/13464#issuecomment-698254008



   > Currently, we are assigning a slot to an arbitrary matching requirement, 
which is non optimal. Imagine there are 2 requirements (100mb, 200mb), and 2 
resources (100mb, 200mb). If the 200mb resource is assigned to the 100mb 
requirement, the remaining 100mb resource cannot fulfill the remaining 200mb 
requirement, and we will need to allocate new resources.
   
   This is true; more optimal matching will be implemented later.
   
   > Correct me if I'm wrong on this. With the declarative resource management, 
resource/slot manager should no longer decide how to map individual slot 
requests to slots, which is now the responsibility of JobMaster. I think the 
resource/slot manager should properly decide whether the job needs new 
resources. With the current implementation, resource manager may keep 
allocating new resources while the job is running perfectly.
   
   The slot manager does not map requests to slots, but it does map 
requirements _profiles_ to slot _profiles_. This simplifies things quite a bit; 
if a slot of profile A is lost we know requirements X,Y,Z could be affected. 
Similarly, if requirement X goes down, we know _immediately_ that slot A,B,C 
are now no longer needed.
   If we would not to do this we would have to go through a full 
slot<->requirement re-matching process whenever slots are removed / 
requirements are reduced. Doing this on every event is too expensive, not only 
computationally but also because the resulting mapping can be completely 
different than the previous one (by virtue of being _optimal_); this should 
only be done if some condition was triggered; for example if X time has passed, 
N slots were added/removed, a new job was submitted, X% better utilization 
could be achieved etc. .
   
   This rebalancing will likely be the responsibility of another component, 
that takes the acquired/required resources for all jobs and then re-assigns 
resources to jobs via `notify(Acquired/Lost)Resource`.
   
   
   As for the slot manager continuing to allocate resources: The slot manager 
merely tries to fulfill the declared requirements. If the job received enough 
resources to run and the scheduler has no intention of upscaling the job, then 
it shall reduce requirements. If the scheduler wants to retain the option of 
scaling up later on, it can keep the requirements as is, the slot manager will 
continue providing slots, and the scheduler may scale up at some point. Do note 
that the exact specification for how Schedulers declare the range of 
requirements is still TBD; I can easily imagine that a simple min-optimal-max 
approach will always result in slots being wasted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zentol edited a comment on pull request #13464: [FLINK-19307][coordination] Add ResourceTracker

Reply via email to