Thanks for drafting this FLIP and starting this discussion Yangze. I like that defining resource requirements on a slot sharing group makes the overall setup easier and improves usability of resource requirements.
What I do not like about it is that it changes slot sharing groups from being a scheduling hint to something which needs to be supported in order to support fine grained resource requirements. So far, the idea of slot sharing groups was that it tells the system that a set of operators can be deployed in the same slot. But the system still had the freedom to say that it would rather place these tasks in different slots if it wanted. If we now specify resource requirements on a per slot sharing group, then the only option for a scheduler which does not support slot sharing groups is to say that every operator in this slot sharing group needs a slot with the same resources as the whole group. So for example, if we have a job consisting of two operator op_1 and op_2 where each op needs 100 MB of memory, we would then say that the slot sharing group needs 200 MB of memory to run. If we have a cluster with 2 TMs with one slot of 100 MB each, then the system cannot run this job. If the resources were specified on an operator level, then the system could still make the decision to deploy op_1 to TM_1 and op_2 to TM_2. Originally, one of the primary goals of slot sharing groups was to make it easier for the user to reason about how many slots a job needs independent of the actual number of operators in the job. Interestingly, if all operators have their resources properly specified, then slot sharing is no longer needed because Flink could slice off the appropriately sized slots for every Task individually. What matters is whether the whole cluster has enough resources to run all tasks or not. Cheers, Till On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <karma...@gmail.com> wrote: > Hi, there, > > We would like to start a discussion thread on "FLIP-156: Runtime > Interfaces for Fine-Grained Resource Requirements"[1], where we > propose Slot Sharing Group (SSG) based runtime interfaces for > specifying fine-grained resource requirements. > > In this FLIP: > - Expound the user story of fine-grained resource management. > - Propose runtime interfaces for specifying SSG-based resource > requirements. > - Discuss the pros and cons of the three potential granularities for > specifying the resource requirements (op, task and slot sharing group) > and explain why we choose the slot sharing group. > > Please find more details in the FLIP wiki document [1]. Looking > forward to your feedback. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements > > Best, > Yangze Guo >