zentol edited a comment on pull request #13464: URL: https://github.com/apache/flink/pull/13464#issuecomment-698254008
> Currently, we are assigning a slot to an arbitrary matching requirement, which is non optimal. Imagine there are 2 requirements (100mb, 200mb), and 2 resources (100mb, 200mb). If the 200mb resource is assigned to the 100mb requirement, the remaining 100mb resource cannot fulfill the remaining 200mb requirement, and we will need to allocate new resources. This is true; more optimal matching will be implemented later. > Correct me if I'm wrong on this. With the declarative resource management, resource/slot manager should no longer decide how to map individual slot requests to slots, which is now the responsibility of JobMaster. I think the resource/slot manager should properly decide whether the job needs new resources. With the current implementation, resource manager may keep allocating new resources while the job is running perfectly. The slot manager does not map requests to slots, but it does map requirements _profiles_ to slot _profiles_. This simplifies things quite a bit; if a slot of profile A is lost we know requirements X,Y,Z could be affected. Similarly, if requirement X goes down, we know _immediately_ that slot A,B,C are now no longer needed. If we would not to do this we would have to go through a full slot<->requirement re-matching process whenever slots are removed / requirements are reduced. Doing this on every event is too expensive, not only computationally but also because the resulting mapping can be completely different than the previous one (by virtue of being _optimal_); this should only be done if some condition was triggered; for example if X time has passed, N slots were added/removed, a new job was submitted, X% better utilization could be achieved etc. . This rebalancing will likely be the responsibility of another component, that takes the acquired/required resources for all jobs and then re-assigns resources to jobs via `notify(Acquired/Lost)Resource`. As for the slot manager continuing to allocate resources: The slot manager merely tries to fulfill the declared requirements. If the job received enough resources to run and the scheduler has no intention of upscaling the job, then it shall reduce requirements. If the scheduler wants to retain the option of scaling up later on, it can keep the requirements as is, the slot manager will continue providing slots, and the scheduler may scale up at some point. Do note that the exact specification for how Schedulers declare the range of requirements is still TBD; I can easily imagine that a simple min-optimal-max approach will always result in slots being wasted. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org