Thanks for your feedback. @Till > the only option for a scheduler which does not support slot sharing groups is > to say that every operator in this slot sharing group needs a slot with the > same resources as the whole group. At the moment, all the implementations of the scheduler respect the slot sharing group. Regarding your example, in that case, user can directly split two operators into two slot sharing groups with 100M respectively.
> If all operators have their resources properly specified, then slot sharing > is no longer needed. I also agree with it. However, specifying resource requirements for each operator is impractical for complex jobs that contain tens or even hundreds of operators. It's also hard to have a default value for operator resource requirements. The SSG-based approach makes the user's configuration more flexible. In many cases, users just care about/know the resource requirement of some subgraphs. Enforcing them to provide more information harms the usability. If the expert user knows more fine-grained resource requirements, the operator granularity resource requirements can be realized by configuring the slot sharing group arrangement. @Chesney > Will declaring them on slot sharing groups not also waste resources if the > parallelism of operators within that group are different? Yes, we list it as one of the cons of the SSG-based approach. In that case, user needs to separate operators with different parallelisms into different SSGs. However, compared to the benefits we list, we tend to treat it as a trade-off between usability and resource utilization for the user to decide. All in all, fine-grained resource management is for expert users to further optimize resource utilization, such an extra effort might be worth it. > It also seems like quite a hassle for users having to recalculate the > resource requirements if they change the slot sharing. If an expert user knows the exact resource requirements of each operator, they can just place each operator in different slot sharing groups. If they want some of them placed in the same slot, they just need to sum up the resource requirements of those operators. There is no need to maintain the resource requirement of a set of re-usable operators. > My main worry is that if we wire the runtime to work on SSGs it's gonna be > difficult to implement more fine-grained approaches. One of the important reasons we choose the SSG-based approach is that we find that the slot is the basic unit for resource management in Flinkās runtime. - Runtime interfaces should only require the minimum set of information needed. Operator-level resource requirements will be converted to Slot-level. - So far, the end-user interfaces for specifying resource requirements are still under discussion. For runtime interfaces, it should only require the minimum set of information needed for resource management. Best, Yangze Guo On Thu, Jan 7, 2021 at 10:00 PM Chesnay Schepler <ches...@apache.org> wrote: > > Will declaring them on slot sharing groups not also waste resources if > the parallelism of operators within that group are different? > > It also seems like quite a hassle for users having to recalculate the > resource requirements if they change the slot sharing. > I'd think that it's not really workable for users that create a set of > re-usable operators which are mixed and matched in their applications; > managing the resources requirements in such a setting would be a > nightmare, and in the end would require operator-level requirements any way. > In that sense, I'm not even sure whether it really increases usability. > > My main worry is that it if we wire the runtime to work on SSGs it's > gonna be difficult to implement more fine-grained approaches, which > would not be the case if, for the runtime, they are always defined on an > operator-level. > > On 1/7/2021 2:42 PM, Till Rohrmann wrote: > > Thanks for drafting this FLIP and starting this discussion Yangze. > > > > I like that defining resource requirements on a slot sharing group makes > > the overall setup easier and improves usability of resource requirements. > > > > What I do not like about it is that it changes slot sharing groups from > > being a scheduling hint to something which needs to be supported in order > > to support fine grained resource requirements. So far, the idea of slot > > sharing groups was that it tells the system that a set of operators can be > > deployed in the same slot. But the system still had the freedom to say that > > it would rather place these tasks in different slots if it wanted. If we > > now specify resource requirements on a per slot sharing group, then the > > only option for a scheduler which does not support slot sharing groups is > > to say that every operator in this slot sharing group needs a slot with the > > same resources as the whole group. > > > > So for example, if we have a job consisting of two operator op_1 and op_2 > > where each op needs 100 MB of memory, we would then say that the slot > > sharing group needs 200 MB of memory to run. If we have a cluster with 2 > > TMs with one slot of 100 MB each, then the system cannot run this job. If > > the resources were specified on an operator level, then the system could > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2. > > > > Originally, one of the primary goals of slot sharing groups was to make it > > easier for the user to reason about how many slots a job needs independent > > of the actual number of operators in the job. Interestingly, if all > > operators have their resources properly specified, then slot sharing is no > > longer needed because Flink could slice off the appropriately sized slots > > for every Task individually. What matters is whether the whole cluster has > > enough resources to run all tasks or not. > > > > Cheers, > > Till > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <karma...@gmail.com> wrote: > > > >> Hi, there, > >> > >> We would like to start a discussion thread on "FLIP-156: Runtime > >> Interfaces for Fine-Grained Resource Requirements"[1], where we > >> propose Slot Sharing Group (SSG) based runtime interfaces for > >> specifying fine-grained resource requirements. > >> > >> In this FLIP: > >> - Expound the user story of fine-grained resource management. > >> - Propose runtime interfaces for specifying SSG-based resource > >> requirements. > >> - Discuss the pros and cons of the three potential granularities for > >> specifying the resource requirements (op, task and slot sharing group) > >> and explain why we choose the slot sharing group. > >> > >> Please find more details in the FLIP wiki document [1]. Looking > >> forward to your feedback. > >> > >> [1] > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements > >> > >> Best, > >> Yangze Guo > >> >