Thanks for sharing your thoughts Kezhu. I like your ideas of how per-operator and SSG requirements can be combined. I've also thought about defining a default resource profile for all tasks which have no resources configured. That way all operators would have resources assigned if the user chooses to use this feature.
As Yangze and Xintong have said, we have decided to first only support specifying resources for SSGs as this seems more user friendly. Based on the feedback for this feature one potential development direction might be to allow the resource specification on per-operator basis. Here we could pick up your ideas. Cheers, Till On Wed, Feb 3, 2021 at 7:31 AM Xintong Song <tonysong...@gmail.com> wrote: > Thanks for your feedback, Kezhu. > > I think Flink *runtime* already has an ideal granularity for resource > > management 'task'. If there is > > a slot shared by multiple tasks, that slot's resource requirement is > simple > > sum of all its logical > > slots. So basically, this is no resource requirement for SlotSharingGroup > > in runtime until now, > > right ? > > That is a halfly-cooked implementation, coming from the previous attempts > (years ago) trying to deliver the fine-grained resource management feature, > and never really put into use. > > From the FLIP and dicusssion, I assume that SSG resource specifying will > > override operator level > > resource specifying if both are specified ? > > > Actually, I think we should use the finer-grained resources (i.e. operator > level) if both are specified. And more importantly, that is based on the > assumption that we do need two different levels of interfaces. > > So, I wonder whether we could interpret SSG resource specifying as an "add" > > but not an "set" on > > resource requirement ? > > > IIUC, this is the core idea behind your proposal. I think it provides an > interesting idea of how we combine operator level and SSG level resources, > *if > we allow configuring resources at both levels*. However, I'm not sure > whether the configuring resources on the operator level is indeed needed. > Therefore, as a first step, this FLIP proposes to only introduce the > SSG-level interfaces. As listed in the future plan, we would consider > allowing operator level resource configuration later if we do see a need > for it. At that time, we definitely should discuss what to do if resources > are configured at both levels. > > * Could SSG express negative resource requirement ? > > > No. > > Is there concrete bar for partial resource configured not function ? I > > saw it will fail job submission in Dispatcher.submitJob. > > > With the SSG-based approach, this should no longer be needed. The > constraint was introduced because we can neither properly define what is > the resource of a task chained from an operator with specified resource and > another with unspecified resource, nor for a slot shared by a task with > specified resource and another with unspecified resource. With the > SSG-based approach, we no longer have those problems. > > An option(cluster/job level) to force slot sharing in scheduler ? This > > could be useful in case of migration from FLIP-156 to future approach. > > > I think this is exactly what we are trying to avoid, requiring the > scheduler to enforce slot sharing. > > An option(cluster) to ignore resource specifying(allow resource specified > > job to run on open box environment) for no production usage ? > > > That's possible. Actually, we are planning to introduce an option for > activating the fine-grained resource management, for development purposes. > We might consider to keep that option after the feature is completed, to > allow disable the feature without having to touch the job codes. > > Thank you~ > > Xintong Song > > > > On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <kez...@gmail.com> wrote: > > > Hi all, sorry for join discussion even after voting started. > > > > I want to share my thoughts on this after reading above discussions. > > > > I think Flink *runtime* already has an ideal granularity for resource > > management 'task'. If there is > > a slot shared by multiple tasks, that slot's resource requirement is > simple > > sum of all its logical > > slots. So basically, this is no resource requirement for SlotSharingGroup > > in runtime until now, > > right ? > > > > As in discussion, we already agree upon that: "If all operators have > their > > resources properly > > specified, then slot sharing is no longer needed. " > > > > So seems to me, naturally in mind path, what we would discuss is that: > how > > to bridge impractical > > operator level resource specifying to runtime task level resource > > requirement ? This is actually a > > pure api thing as Chesnay has pointed out. > > > > But FLIP-156 brings another direction on table: how about using SSG for > > both api and runtime > > resource specifying ? > > > > From the FLIP and dicusssion, I assume that SSG resource specifying will > > override operator level > > resource specifying if both are specified ? > > > > So, I wonder whether we could interpret SSG resource specifying as an > "add" > > but not an "set" on > > resource requirement ? > > > > The semantics is that SSG resource specifying adds additional resource to > > shared slot to express > > concerns on possible high thoughput and resource requirement for tasks in > > one physical slot. > > > > The result is that if scheduler indeed respect slot sharing, allocated > slot > > will gain extra resource > > specified for that SSG. > > > > I think one of coding barrier from "add" approach is ResourceSpec.UNKNOWN > > which didn't support > > 'merge' operation. I tend to use ResourceSpec.ZERO as default, task > > executor should be aware of > > this. > > > > @Chesnay > > > My main worry is that it if we wire the runtime to work on SSGs it's > > > gonna be difficult to implement more fine-grained approaches, which > > > would not be the case if, for the runtime, they are always defined on > an > > > operator-level. > > > > An "add" operation should be less invasive and enforce low barrier for > > future find-grained > > approaches. > > > > @Stephan > > > - Users can define different slot sharing groups for operators like > > they > > > do now, with the exception that you cannot mix operators that have a > > > resource profile and operators that have no resource profile. > > > > @Till > > > This effectively means that all unspecified operators > > > will implicitly have a zero resource requirement. > > > I am wondering whether this wouldn't lead to a surprising behaviour for > > the > > > user. If the user specifies the resource requirements for a single > > > operator, then he probably will assume that the other operators will > get > > > the default share of resources and not nothing. > > > > I think it is inherent due to fact that we could not defining > > ResourceSpec.ONE, eg. resource > > requirement for exact one default slot, with concrete numbers ? I tend to > > squash out unspecified one > > if there are operators in chaining with explicit resource specifying. > > Otherwise, the protocol tends > > to verbose as say "give me this much resource and a default". I think if > we > > have explict resource > > specifying for partial operators, it is just saying "I don't care other > > operators that much, just > > get them places to run". It is most likely be cases there are stateless > > fliter/map or other less > > resource consuming operators. If there is indeed a problem, I think > clients > > can specify a global > > default(or other level default in future). In job graph generating phase, > > we could take that default > > into account for unspecified operators. > > > > @FLIP-156 > > > Expose operator chaining. (Cons fo task level resource specifying) > > > > Is it inherent for all group level resource specifying ? They will either > > break chaining or obey it, > > or event could not work with. > > > > To sum up above, my suggestions are: > > > > In api side: > > * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if > > unspecified). > > * Operator: ResourceSpec.ZERO(unspecified) as default. > > * Task: sum of requirements from specified operators + global default(if > > there are any unspecified operators) > > * SSG: additional resource to physical slot. > > > > In runtime side: > > * Task: ResourceSpec.Task or ResourceSpec.ZERO > > * SSG: ResourceSpec.SSG or ResourceSpec.ZERO > > > > Physical slot gets sum up resources from logical slots and SSG, if it > gets > > ResourceSpec.ZERO, it is > > just a default sized slot. > > > > In short, turn SSG resource speciying as "add" and drop > > ResourceSpec.UNKNOWN. > > > > > > Questions/Issues: > > * Could SSG express negative resource requirement ? > > * Is there concrete bar for partial resource configured not function ? I > > saw it will fail job submission in Dispatcher.submitJob. > > * An option(cluster/job level) to force slot sharing in scheduler ? This > > could be useful in case of migration from FLIP-156 to future approach. > > * An option(cluster) to ignore resource specifying(allow resource > specified > > job to run on open box environment) for no production usage ? > > > > > > > > On February 1, 2021 at 11:54:10, Yangze Guo (karma...@gmail.com) wrote: > > > > Thanks for reply, Till and Xintong! > > > > I update the FLIP, including: > > - Edit the JavaDoc of the proposed > > StreamGraphGenerator#setSlotSharingGroupResource. > > - Add "Future Plan" section, which contains the potential follow-up > > issues and the limitations to be documented when fine-grained resource > > management is exposed to users. > > > > I'll start a vote in another thread. > > > > Best, > > Yangze Guo > > > > On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <trohrm...@apache.org> > > wrote: > > > > > > Thanks for summarizing the discussion, Yangze. I agree that setting > > > resource requirements per operator is not very user friendly. > Moreover, I > > > couldn't come up with a different proposal which would be as easy to > use > > > and wouldn't expose internal scheduling details. In fact, following > this > > > argument then we shouldn't have exposed the slot sharing groups in the > > > first place. > > > > > > What is important for the user is that we properly document the > > limitations > > > and constraints the fine grained resource specification has. For > example, > > > we should explain how optimizations like chaining are affected by it > and > > > how different execution modes (batch vs. streaming) affect the > execution > > of > > > operators which have specified resources. These things shouldn't become > > > part of the contract of this feature and are more caused by internal > > > implementation details but it will be important to understand these > > things > > > properly in order to use this feature effectively. > > > > > > Hence, +1 for starting the vote for this FLIP. > > > > > > Cheers, > > > Till > > > > > > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <tonysong...@gmail.com> > > wrote: > > > > > > > Thanks for the summary, Yangze. > > > > > > > > The changes and follow-up issues LGTM. Let's wait for responses from > > the > > > > others before starting a vote. > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <karma...@gmail.com> > > wrote: > > > > > > > > > Thanks everyone for the lively discussion. I'd like to try to > > > > > summarize the current convergence in the discussion. Please let me > > > > > know if I got things wrong or missed something crucial here. > > > > > > > > > > Change of this FLIP: > > > > > - Treat the SSG resource requirements as a hint instead of a > > > > > restriction for the runtime. That's should be explicitly explained > in > > > > > the JavaDocs. > > > > > > > > > > Potential follow-up issues if needed: > > > > > - Provide operator-level resource configuration interface. > > > > > - Provide multiple options for deciding resources for SSGs whose > > > > > requirement is not specified: > > > > > ** Default slot resource. > > > > > ** Default operator resource times number of operators. > > > > > > > > > > If there are no other issues, I'll update the FLIP accordingly and > > > > > start a vote thread. Thanks all for the valuable feedback again. > > > > > > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > > > > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song < > tonysong...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > FGRuntimeInterface.png > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song < > > tonysong...@gmail.com> > > > > > > > wrote: > > > > > >> > > > > > >> I think Chesnay's proposal could actually work. IIUC, the > keypoint > > is > > > > > to derive operator requirements from SSG requirements on the API > > side, so > > > > > that the runtime only deals with operator requirements. It's > > debatable > > > > how > > > > > the deriving should be done though. E.g., an alternative could be > to > > > > evenly > > > > > divide the SSG requirement into requirements of operators in the > > group. > > > > > >> > > > > > >> > > > > > >> However, I'm not entirely sure which option is more desired. > > > > > Illustrating my understanding in the following figure, in which on > > the > > > > top > > > > > is Chesnay's proposal and on the bottom is the SSG-based proposal > in > > this > > > > > FLIP. > > > > > >> > > > > > >> > > > > > >> > > > > > >> I think the major difference between the two approaches is where > > > > > deriving operator requirements from SSG requirements happens. > > > > > >> > > > > > >> - Chesnay's proposal simplifies the runtime logic and the > > interface to > > > > > expose, at the price of moving more complexity (i.e. the deriving) > to > > the > > > > > API side. The question is, where do we prefer to keep the > complexity? > > I'm > > > > > slightly leaning towards having a thin API and keep the complexity > in > > > > > runtime if possible. > > > > > >> > > > > > >> - Notice that the dash line arrows represent optional steps that > > are > > > > > needed only for schedulers that do not respect SSGs, which we don't > > have > > > > at > > > > > the moment. If we only look at the solid line arrows, then the > > SSG-based > > > > > approach is much simpler, without needing to derive and aggregate > the > > > > > requirements back and forth. I'm not sure about complicating the > > current > > > > > design only for the potential future needs. > > > > > >> > > > > > >> > > > > > >> Thank you~ > > > > > >> > > > > > >> Xintong Song > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler < > > ches...@apache.org> > > > > > wrote: > > > > > >>> > > > > > >>> You're raising a good point, but I think I can rectify that > with > > a > > > > > minor > > > > > >>> adjustment. > > > > > >>> > > > > > >>> Default requirements are whatever the default requirements are, > > > > setting > > > > > >>> the requirements for one operator has no effect on other > > operators. > > > > > >>> > > > > > >>> With these rules, and some API enhancements, the following > mockup > > > > would > > > > > >>> replicate the SSG-based behavior: > > > > > >>> > > > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ... > > > > > >>> for slotSharingGroup in env.getSlotSharingGroups() { > > > > > >>> vertices = slotSharingGroup.getVertices() > > > > > >>> > > > > > > > > > > > > vertices.first().setRequirements(requirements.get(slotSharingGroup.getID()) > > > > > >>> vertices.remainint().setRequirements(ZERO) > > > > > >>> } > > > > > >>> > > > > > >>> We could even allow setting requirements on slotsharing-groups > > > > > >>> colocation-groups and internally translate them accordingly. > > > > > >>> I can't help but feel this is a plain API issue. > > > > > >>> > > > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote: > > > > > >>> > If I understand you correctly Chesnay, then you want to > > decouple > > > > the > > > > > >>> > resource requirement specification from the slot sharing > group > > > > > >>> > assignment. Hence, per default all operators would be in the > > same > > > > > slot > > > > > >>> > sharing group. If there is no operator with a resource > > > > specification, > > > > > >>> > then the system would allocate a default slot for it. If > there > > is > > > > at > > > > > >>> > least one operator, then the system would sum up all the > > specified > > > > > >>> > resources and allocate a slot of this size. This effectively > > means > > > > > >>> > that all unspecified operators will implicitly have a zero > > resource > > > > > >>> > requirement. Did I understand your idea correctly? > > > > > >>> > > > > > > >>> > I am wondering whether this wouldn't lead to a surprising > > behaviour > > > > > >>> > for the user. If the user specifies the resource requirements > > for a > > > > > >>> > single operator, then he probably will assume that the other > > > > > operators > > > > > >>> > will get the default share of resources and not nothing. > > > > > >>> > > > > > > >>> > Cheers, > > > > > >>> > Till > > > > > >>> > > > > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler < > > > > ches...@apache.org > > > > > >>> > <mailto:ches...@apache.org>> wrote: > > > > > >>> > > > > > > >>> > Is there even a functional difference between specifying the > > > > > >>> > requirements for an SSG vs specifying the same requirements > on > > > > a > > > > > >>> > single > > > > > >>> > operator within that group (ideally a colocation group to > avoid > > > > > this > > > > > >>> > whole hint business)? > > > > > >>> > > > > > > >>> > Wouldn't we get the best of both worlds in the latter case? > > > > > >>> > > > > > > >>> > Users can take shortcuts to define shared requirements, > > > > > >>> > but refine them further as needed on a per-operator basis, > > > > > >>> > without changing semantics of slotsharing groups > > > > > >>> > nor the runtime being locked into SSG-based requirements. > > > > > >>> > > > > > > >>> > (And before anyone argues what happens if slotsharing groups > > > > > >>> > change or > > > > > >>> > whatnot, that's a plain API issue that we could surely solve. > > > > (A > > > > > >>> > plain > > > > > >>> > iteration over slotsharing groups and therein contained > > > > operators > > > > > >>> > would > > > > > >>> > suffice)). > > > > > >>> > > > > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote: > > > > > >>> > > Maybe a different minor idea: Would it be possible to treat > > > > > the SSG > > > > > >>> > > resource requirements as a hint for the runtime similar to > > > > how > > > > > >>> > slot sharing > > > > > >>> > > groups are designed at the moment? Meaning that we don't > give > > > > > >>> > the guarantee > > > > > >>> > > that Flink will always deploy this set of tasks together no > > > > > >>> > matter what > > > > > >>> > > comes. If, for example, the runtime can derive by some > means > > > > > the > > > > > >>> > resource > > > > > >>> > > requirements for each task based on the requirements for > the > > > > > >>> > SSG, this > > > > > >>> > > could be possible. One easy strategy would be to give every > > > > > task > > > > > >>> > the same > > > > > >>> > > resources as the whole slot sharing group. Another one > could > > > > be > > > > > >>> > > distributing the resources equally among the tasks. This > does > > > > > >>> > not even have > > > > > >>> > > to be implemented but we would give ourselves the freedom > to > > > > > change > > > > > >>> > > scheduling if need should arise. > > > > > >>> > > > > > > > >>> > > Cheers, > > > > > >>> > > Till > > > > > >>> > > > > > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo < > > > > karma...@gmail.com > > > > > >>> > <mailto:karma...@gmail.com>> wrote: > > > > > >>> > > > > > > > >>> > >> Thanks for the responses, Till and Xintong. > > > > > >>> > >> > > > > > >>> > >> I second Xintong's comment that SSG-based runtime > interface > > > > > >>> > will give > > > > > >>> > >> us the flexibility to achieve op/task-based approach. > That's > > > > > one of > > > > > >>> > >> the most important reasons for our design choice. > > > > > >>> > >> > > > > > >>> > >> Some cents regarding the default operator resource: > > > > > >>> > >> - It might be good for the scenario of DataStream jobs. > > > > > >>> > >> ** For light-weight operators, the accumulative > > > > > >>> > configuration error > > > > > >>> > >> will not be significant. Then, the resource of a task used > > > > is > > > > > >>> > >> proportional to the number of operators it contains. > > > > > >>> > >> ** For heavy operators like join and window or operators > > > > > >>> > using the > > > > > >>> > >> external resources, user will turn to the fine-grained > > > > > resource > > > > > >>> > >> configuration. > > > > > >>> > >> - It can increase the stability for the standalone cluster > > > > > >>> > where task > > > > > >>> > >> executors registered are heterogeneous(with different > > > > default > > > > > slot > > > > > >>> > >> resources). > > > > > >>> > >> - It might not be good for SQL users. The operators that > SQL > > > > > >>> > will be > > > > > >>> > >> transferred to is a black box to the user. We also do not > > > > > guarantee > > > > > >>> > >> the cross-version of consistency of the transformation so > > > > far. > > > > > >>> > >> > > > > > >>> > >> I think it can be treated as a follow-up work when the > > > > > fine-grained > > > > > >>> > >> resource management is end-to-end ready. > > > > > >>> > >> > > > > > >>> > >> Best, > > > > > >>> > >> Yangze Guo > > > > > >>> > >> > > > > > >>> > >> > > > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song > > > > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>> > > > > > >>> > >> wrote: > > > > > >>> > >>> Thanks for the feedback, Till. > > > > > >>> > >>> > > > > > >>> > >>> ## I feel that what you proposed (operator-based + > default > > > > > >>> > value) might > > > > > >>> > >> be > > > > > >>> > >>> subsumed by the SSG-based approach. > > > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4 > cases, > > > > > >>> > categorized by > > > > > >>> > >>> whether the resource requirements are known to the users. > > > > > >>> > >>> > > > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no > > > > > >>> > reason to put > > > > > >>> > >>> multiple operators whose individual resource > > > > requirements > > > > > >>> > are already > > > > > >>> > >> known > > > > > >>> > >>> into the same group in fine-grained resource > > > > management. > > > > > >>> > And if op_1 > > > > > >>> > >> and > > > > > >>> > >>> op_2 are in different groups, there should be no > > > > problem > > > > > >>> > switching > > > > > >>> > >> data > > > > > >>> > >>> exchange mode from pipelined to blocking. This is > > > > > >>> > equivalent to > > > > > >>> > >> specifying > > > > > >>> > >>> operator resource requirements in your proposal. > > > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except > > > > that > > > > > >>> > op_2 is in a > > > > > >>> > >>> SSG whose resource is not specified thus would have the > > > > > >>> > default slot > > > > > >>> > >>> resource. This is equivalent to having default operator > > > > > >>> > resources in > > > > > >>> > >> your > > > > > >>> > >>> proposal. > > > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and > > > > op_2 > > > > > >>> > to the same > > > > > >>> > >> SSG > > > > > >>> > >>> or separate SSGs. > > > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be > > > > > >>> > equivalent to > > > > > >>> > >> the > > > > > >>> > >>> coarse-grained resource management, where op_1 and > > > > > op_2 > > > > > >>> > share a > > > > > >>> > >> default > > > > > >>> > >>> size slot no matter which data exchange mode is > > > > used. > > > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each > > > > of > > > > > >>> > them will > > > > > >>> > >> use > > > > > >>> > >>> a default size slot. This is equivalent to setting > > > > > them > > > > > >>> > with > > > > > >>> > >> default > > > > > >>> > >>> operator resources in your proposal. > > > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2 > > > > > is > > > > > >>> > known.* > > > > > >>> > >>> - It is possible that the user learns the total / > > > > max > > > > > >>> > resource > > > > > >>> > >>> requirement from executing and monitoring the job, > > > > > >>> > while not > > > > > >>> > >>> being aware of > > > > > >>> > >>> individual operator requirements. > > > > > >>> > >>> - I believe this is the case your proposal does not > > > > > >>> > cover. And TBH, > > > > > >>> > >>> this is probably how most users learn the resource > > > > > >>> > requirements, > > > > > >>> > >>> according > > > > > >>> > >>> to my experiences. > > > > > >>> > >>> - In this case, the user might need to specify > > > > > >>> > different resources > > > > > >>> > >> if > > > > > >>> > >>> he wants to switch the execution mode, which should > > > > > not > > > > > >>> > be worse > > > > > >>> > >> than not > > > > > >>> > >>> being able to use fine-grained resource management. > > > > > >>> > >>> > > > > > >>> > >>> > > > > > >>> > >>> ## An additional idea inspired by your proposal. > > > > > >>> > >>> We may provide multiple options for deciding resources > for > > > > > >>> > SSGs whose > > > > > >>> > >>> requirement is not specified, if needed. > > > > > >>> > >>> > > > > > >>> > >>> - Default slot resource (current design) > > > > > >>> > >>> - Default operator resource times number of operators > > > > > >>> > (equivalent to > > > > > >>> > >>> your proposal) > > > > > >>> > >>> > > > > > >>> > >>> > > > > > >>> > >>> ## Exposing internal runtime strategies > > > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource > > > > > >>> > requirements might be > > > > > >>> > >>> affected if how SSGs are internally handled changes in > > > > > future. > > > > > >>> > >> Practically, > > > > > >>> > >>> I do not concretely see at the moment what kind of > changes > > > > we > > > > > >>> > may want in > > > > > >>> > >>> future that might conflict with this FLIP proposal, as > the > > > > > >>> > question of > > > > > >>> > >>> switching data exchange mode answered above. I'd suggest > to > > > > > >>> > not give up > > > > > >>> > >> the > > > > > >>> > >>> user friendliness we may gain now for the future problems > > > > > that > > > > > >>> > may or may > > > > > >>> > >>> not exist. > > > > > >>> > >>> > > > > > >>> > >>> Moreover, the SSG-based approach has the flexibility to > > > > > >>> > achieve the > > > > > >>> > >>> equivalent behavior as the operator-based approach, if we > > > > > set each > > > > > >>> > >> operator > > > > > >>> > >>> (or task) to a separate SSG. We can even provide a > shortcut > > > > > >>> > option to > > > > > >>> > >>> automatically do that for users, if needed. > > > > > >>> > >>> > > > > > >>> > >>> > > > > > >>> > >>> Thank you~ > > > > > >>> > >>> > > > > > >>> > >>> Xintong Song > > > > > >>> > >>> > > > > > >>> > >>> > > > > > >>> > >>> > > > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann > > > > > >>> > <trohrm...@apache.org <mailto:trohrm...@apache.org>> > > > > > >>> > >> wrote: > > > > > >>> > >>>> Thanks for the responses Xintong and Stephan, > > > > > >>> > >>>> > > > > > >>> > >>>> I agree that being able to define the resource > > > > requirements > > > > > for a > > > > > >>> > >> group of > > > > > >>> > >>>> operators is more user friendly. However, my concern is > > > > that > > > > > >>> > we are > > > > > >>> > >>>> exposing thereby internal runtime strategies which might > > > > > >>> > limit our > > > > > >>> > >>>> flexibility to execute a given job. Moreover, the > > > > semantics > > > > > of > > > > > >>> > >> configuring > > > > > >>> > >>>> resource requirements for SSGs could break if switching > > > > from > > > > > >>> > streaming > > > > > >>> > >> to > > > > > >>> > >>>> batch execution. If one defines the resource > requirements > > > > > for > > > > > >>> > op_1 -> > > > > > >>> > >> op_2 > > > > > >>> > >>>> which run in pipelined mode when using the streaming > > > > > >>> > execution, then > > > > > >>> > >> how do > > > > > >>> > >>>> we interpret these requirements when op_1 -> op_2 are > > > > > >>> > executed with a > > > > > >>> > >>>> blocking data exchange in batch execution mode? > > > > > Consequently, > > > > > >>> > I am > > > > > >>> > >> still > > > > > >>> > >>>> leaning towards Stephan's proposal to set the resource > > > > > >>> > requirements per > > > > > >>> > >>>> operator. > > > > > >>> > >>>> > > > > > >>> > >>>> Maybe the following proposal makes the configuration > > > > easier: > > > > > >>> > If the > > > > > >>> > >> user > > > > > >>> > >>>> wants to use fine-grained resource requirements, then > she > > > > > >>> > needs to > > > > > >>> > >> specify > > > > > >>> > >>>> the default size which is used for operators which have > no > > > > > >>> > explicit > > > > > >>> > >>>> resource annotation. If this holds true, then every > > > > operator > > > > > >>> > would > > > > > >>> > >> have a > > > > > >>> > >>>> resource requirement and the system can try to execute > the > > > > > >>> > operators > > > > > >>> > >> in the > > > > > >>> > >>>> best possible manner w/o being constrained by how the > user > > > > > >>> > set the SSG > > > > > >>> > >>>> requirements. > > > > > >>> > >>>> > > > > > >>> > >>>> Cheers, > > > > > >>> > >>>> Till > > > > > >>> > >>>> > > > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song > > > > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>> > > > > > >>> > >>>> wrote: > > > > > >>> > >>>> > > > > > >>> > >>>>> Thanks for the feedback, Stephan. > > > > > >>> > >>>>> > > > > > >>> > >>>>> Actually, your proposal has also come to my mind at > some > > > > > >>> > point. And I > > > > > >>> > >>>> have > > > > > >>> > >>>>> some concerns about it. > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> 1. It does not give users the same control as the > > > > SSG-based > > > > > >>> > approach. > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> While both approaches do not require specifying for > each > > > > > >>> > operator, > > > > > >>> > >>>>> SSG-based approach supports the semantic that "some > > > > > operators > > > > > >>> > >> together > > > > > >>> > >>>> use > > > > > >>> > >>>>> this much resource" while the operator-based approach > > > > > doesn't. > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> Think of a long pipeline with m operators (o_1, o_2, > ..., > > > > > >>> > o_m), and > > > > > >>> > >> at > > > > > >>> > >>>> some > > > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which > significantly > > > > > >>> > reduces the > > > > > >>> > >> data > > > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups > SSG_1 > > > > > >>> > (o_1, ..., > > > > > >>> > >> o_n) > > > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much > > > > higher > > > > > >>> > >> parallelisms > > > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2 > won't > > > > > >>> > lead to too > > > > > >>> > >> much > > > > > >>> > >>>>> wasting of resources. If the two SSGs end up needing > > > > > different > > > > > >>> > >> resources, > > > > > >>> > >>>>> with the SSG-based approach one can directly specify > > > > > >>> > resources for > > > > > >>> > >> the > > > > > >>> > >>>> two > > > > > >>> > >>>>> groups. However, with the operator-based approach, the > > > > > user will > > > > > >>> > >> have to > > > > > >>> > >>>>> specify resources for each operator in one of the two > > > > > >>> > groups, and > > > > > >>> > >> tune > > > > > >>> > >>>> the > > > > > >>> > >>>>> default slot resource via configurations to fit the > other > > > > > group. > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> 2. It increases the chance of breaking operator chains. > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> Setting chainnable operators into different slot > sharing > > > > > >>> > groups will > > > > > >>> > >>>>> prevent them from being chained. In the current > > > > > implementation, > > > > > >>> > >>>> downstream > > > > > >>> > >>>>> operators, if SSG not explicitly specified, will be set > > > > to > > > > > >>> > the same > > > > > >>> > >> group > > > > > >>> > >>>>> as the chainable upstream operators (unless multiple > > > > > upstream > > > > > >>> > >> operators > > > > > >>> > >>>> in > > > > > >>> > >>>>> different groups), to reduce the chance of breaking > > > > chains. > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> > o_3, > > > > > >>> > deciding > > > > > >>> > >> SSGs > > > > > >>> > >>>>> based on whether resource is specified we will easily > get > > > > > >>> > groups like > > > > > >>> > >>>> (o_1, > > > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be > > > > > >>> > chained. This > > > > > >>> > >> is > > > > > >>> > >>>> also > > > > > >>> > >>>>> possible for the SSG-based approach, but I believe the > > > > > >>> > chance is much > > > > > >>> > >>>>> smaller because there's no strong reason for users to > > > > > >>> > specify the > > > > > >>> > >> groups > > > > > >>> > >>>>> with alternate operators like that. We are more likely > to > > > > > >>> > get groups > > > > > >>> > >> like > > > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only > > > > > between > > > > > >>> > o_2 and > > > > > >>> > >> o_3. > > > > > >>> > >>>>> > > > > > >>> > >>>>> 3. It complicates the system by having two different > > > > > >>> > mechanisms for > > > > > >>> > >>>> sharing > > > > > >>> > >>>>> managed memory in a slot. > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed > > > > memory > > > > > >>> > sharing > > > > > >>> > >>>>> mechanism, where managed memory is first distributed > > > > > >>> > according to the > > > > > >>> > >>>>> consumer type, then further distributed across > operators > > > > > of that > > > > > >>> > >> consumer > > > > > >>> > >>>>> type. > > > > > >>> > >>>>> > > > > > >>> > >>>>> - With the operator-based approach, managed memory size > > > > > >>> > specified > > > > > >>> > >> for an > > > > > >>> > >>>>> operator should account for all the consumer types of > > > > that > > > > > >>> > operator. > > > > > >>> > >> That > > > > > >>> > >>>>> means the managed memory is first distributed across > > > > > >>> > operators, then > > > > > >>> > >>>>> distributed to different consumer types of each > operator. > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> Unfortunately, the different order of the two > calculation > > > > > >>> > steps can > > > > > >>> > >> lead > > > > > >>> > >>>> to > > > > > >>> > >>>>> different results. To be specific, the semantic of the > > > > > >>> > configuration > > > > > >>> > >>>> option > > > > > >>> > >>>>> `consumer-weights` changed (within a slot vs. within an > > > > > >>> > operator). > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> To sum up things: > > > > > >>> > >>>>> > > > > > >>> > >>>>> While (3) might be a bit more implementation related, I > > > > > >>> > think (1) > > > > > >>> > >> and (2) > > > > > >>> > >>>>> somehow suggest that, the price for the proposed > approach > > > > > to > > > > > >>> > avoid > > > > > >>> > >>>>> specifying resource for every operator is that it's not > > > > as > > > > > >>> > >> independent > > > > > >>> > >>>> from > > > > > >>> > >>>>> operator chaining and slot sharing as the > operator-based > > > > > >>> > approach > > > > > >>> > >>>> discussed > > > > > >>> > >>>>> in the FLIP. > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> Thank you~ > > > > > >>> > >>>>> > > > > > >>> > >>>>> Xintong Song > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> > > > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen > > > > > >>> > <se...@apache.org <mailto:se...@apache.org>> > > > > > >>> > >> wrote: > > > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP. > > > > > >>> > >>>>>> > > > > > >>> > >>>>>> I want to say, first of all, that this is super well > > > > > >>> > written. And > > > > > >>> > >> the > > > > > >>> > >>>>>> points that the FLIP makes about how to expose the > > > > > >>> > configuration to > > > > > >>> > >>>> users > > > > > >>> > >>>>>> is exactly the right thing to figure out first. > > > > > >>> > >>>>>> So good job here! > > > > > >>> > >>>>>> > > > > > >>> > >>>>>> About how to let users specify the resource profiles. > > > > If I > > > > > >>> > can sum > > > > > >>> > >> the > > > > > >>> > >>>>> FLIP > > > > > >>> > >>>>>> and previous discussion up in my own words, the > problem > > > > > is the > > > > > >>> > >>>> following: > > > > > >>> > >>>>>> Operator-level specification is the simplest and > > > > cleanest > > > > > >>> > approach, > > > > > >>> > >>>>> because > > > > > >>> > >>>>>>> it avoids mixing operator configuration (resource) > and > > > > > >>> > >> scheduling. No > > > > > >>> > >>>>>>> matter what other parameters change (chaining, slot > > > > > sharing, > > > > > >>> > >>>> switching > > > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource > profiles > > > > > >>> > stay the > > > > > >>> > >>>> same. > > > > > >>> > >>>>>>> But it would require that a user specifies resources > on > > > > > all > > > > > >>> > >>>> operators, > > > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP > > > > suggests > > > > > going > > > > > >>> > >> with > > > > > >>> > >>>>>>> specifying resources on a Sharing-Group. > > > > > >>> > >>>>>> > > > > > >>> > >>>>>> I think both thoughts are important, so can we find a > > > > > solution > > > > > >>> > >> where > > > > > >>> > >>>> the > > > > > >>> > >>>>>> Resource Profiles are specified on an Operator, but we > > > > > >>> > still avoid > > > > > >>> > >> that > > > > > >>> > >>>>> we > > > > > >>> > >>>>>> need to specify a resource profile on every operator? > > > > > >>> > >>>>>> > > > > > >>> > >>>>>> What do you think about something like the following: > > > > > >>> > >>>>>> - Resource Profiles are specified on an operator > > > > level. > > > > > >>> > >>>>>> - Not all operators need profiles > > > > > >>> > >>>>>> - All Operators without a Resource Profile ended up > > > > in > > > > > the > > > > > >>> > >> default > > > > > >>> > >>>> slot > > > > > >>> > >>>>>> sharing group with a default profile (will get a > default > > > > > slot). > > > > > >>> > >>>>>> - All Operators with a Resource Profile will go into > > > > > >>> > another slot > > > > > >>> > >>>>> sharing > > > > > >>> > >>>>>> group (the resource-specified-group). > > > > > >>> > >>>>>> - Users can define different slot sharing groups for > > > > > >>> > operators > > > > > >>> > >> like > > > > > >>> > >>>>> they > > > > > >>> > >>>>>> do now, with the exception that you cannot mix > operators > > > > > >>> > that have > > > > > >>> > >> a > > > > > >>> > >>>>>> resource profile and operators that have no resource > > > > > profile. > > > > > >>> > >>>>>> - The default case where no operator has a resource > > > > > >>> > profile is > > > > > >>> > >> just a > > > > > >>> > >>>>>> special case of this model > > > > > >>> > >>>>>> - The chaining logic sums up the profiles per > > > > operator, > > > > > >>> > like it > > > > > >>> > >> does > > > > > >>> > >>>>> now, > > > > > >>> > >>>>>> and the scheduler sums up the profiles of the tasks > that > > > > > it > > > > > >>> > >> schedules > > > > > >>> > >>>>>> together. > > > > > >>> > >>>>>> > > > > > >>> > >>>>>> > > > > > >>> > >>>>>> There is another question about reactive scaling > raised > > > > > in the > > > > > >>> > >> FLIP. I > > > > > >>> > >>>>> need > > > > > >>> > >>>>>> to think a bit about that. That is indeed a bit more > > > > > tricky > > > > > >>> > once we > > > > > >>> > >>>> have > > > > > >>> > >>>>>> slots of different sizes. > > > > > >>> > >>>>>> It is not clear then which of the different slot > > > > requests > > > > > the > > > > > >>> > >>>>>> ResourceManager should fulfill when new resources > (TMs) > > > > > >>> > show up, > > > > > >>> > >> or how > > > > > >>> > >>>>> the > > > > > >>> > >>>>>> JobManager redistributes the slots resources when > > > > > resources > > > > > >>> > (TMs) > > > > > >>> > >>>>> disappear > > > > > >>> > >>>>>> This question is pretty orthogonal, though, to the > "how > > > > to > > > > > >>> > specify > > > > > >>> > >> the > > > > > >>> > >>>>>> resources". > > > > > >>> > >>>>>> > > > > > >>> > >>>>>> > > > > > >>> > >>>>>> Best, > > > > > >>> > >>>>>> Stephan > > > > > >>> > >>>>>> > > > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song > > > > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com> > > > > > >>> > >>>>> wrote: > > > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the > > > > discussion, > > > > > >>> > Yangze. > > > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay. > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> @Till, > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> I agree that specifying requirements for SSGs means > > > > that > > > > > SSGs > > > > > >>> > >> need to > > > > > >>> > >>>>> be > > > > > >>> > >>>>>>> supported in fine-grained resource management, > > > > otherwise > > > > > each > > > > > >>> > >>>> operator > > > > > >>> > >>>>>>> might use as many resources as the whole group. > > > > However, > > > > > I > > > > > >>> > cannot > > > > > >>> > >>>> think > > > > > >>> > >>>>>> of > > > > > >>> > >>>>>>> a strong reason for not supporting SSGs in > fine-grained > > > > > >>> > resource > > > > > >>> > >>>>>>> management. > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>>> Interestingly, if all operators have their resources > > > > > properly > > > > > >>> > >>>>>> specified, > > > > > >>> > >>>>>>>> then slot sharing is no longer needed because Flink > > > > > could > > > > > >>> > >> slice off > > > > > >>> > >>>>> the > > > > > >>> > >>>>>>>> appropriately sized slots for every Task > individually. > > > > > >>> > >>>>>>>> > > > > > >>> > >>>>>>> So for example, if we have a job consisting of two > > > > > >>> > operator op_1 > > > > > >>> > >> and > > > > > >>> > >>>>> op_2 > > > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would then > > > > say > > > > > that > > > > > >>> > >> the > > > > > >>> > >>>> slot > > > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we > > > > have > > > > > a > > > > > >>> > >> cluster > > > > > >>> > >>>>> with > > > > > >>> > >>>>>> 2 > > > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the system > > > > > cannot run > > > > > >>> > >> this > > > > > >>> > >>>>> job. > > > > > >>> > >>>>>> If > > > > > >>> > >>>>>>>> the resources were specified on an operator level, > > > > then > > > > > the > > > > > >>> > >> system > > > > > >>> > >>>>>> could > > > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and > > > > op_2 > > > > > to > > > > > >>> > >> TM_2. > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> Couldn't agree more that if all operators' > requirements > > > > > are > > > > > >>> > >> properly > > > > > >>> > >>>>>>> specified, slot sharing should be no longer needed. I > > > > > >>> > think this > > > > > >>> > >>>>> exactly > > > > > >>> > >>>>>>> disproves the example. If we already know op_1 and > op_2 > > > > > each > > > > > >>> > >> needs > > > > > >>> > >>>> 100 > > > > > >>> > >>>>> MB > > > > > >>> > >>>>>>> of memory, why would we put them in the same group? > If > > > > > >>> > they are > > > > > >>> > >> in > > > > > >>> > >>>>>> separate > > > > > >>> > >>>>>>> groups, with the proposed approach the system can > > > > freely > > > > > >>> > deploy > > > > > >>> > >> them > > > > > >>> > >>>> to > > > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs. > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> Moreover, the precondition for not needing slot > sharing > > > > > is > > > > > >>> > having > > > > > >>> > >>>>>> resource > > > > > >>> > >>>>>>> requirements properly specified for all operators. > This > > > > > is not > > > > > >>> > >> always > > > > > >>> > >>>>>>> possible, and usually requires tremendous efforts. > One > > > > > of the > > > > > >>> > >>>> benefits > > > > > >>> > >>>>>> for > > > > > >>> > >>>>>>> SSG-based requirements is that it allows the user to > > > > > freely > > > > > >>> > >> decide > > > > > >>> > >>>> the > > > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I would > > > > > >>> > consider SSG > > > > > >>> > >> in > > > > > >>> > >>>>>>> fine-grained resource management as a group of > > > > operators > > > > > >>> > that the > > > > > >>> > >>>> user > > > > > >>> > >>>>>>> would like to specify the total resource for. There > can > > > > > be > > > > > >>> > only > > > > > >>> > >> one > > > > > >>> > >>>>> group > > > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a few > > > > major > > > > > >>> > parts, > > > > > >>> > >> or as > > > > > >>> > >>>>>> many > > > > > >>> > >>>>>>> groups as the number of tasks/operators, depending on > > > > how > > > > > >>> > >>>> fine-grained > > > > > >>> > >>>>>> the > > > > > >>> > >>>>>>> user is able to specify the resources. > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But > given > > > > > >>> > that all > > > > > >>> > >> the > > > > > >>> > >>>>>>> current scheduler implementations already support > > > > SSGs, I > > > > > >>> > tend to > > > > > >>> > >>>> think > > > > > >>> > >>>>>>> that as an acceptable price for the above discussed > > > > > >>> > usability and > > > > > >>> > >>>>>>> flexibility. > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> @Chesnay > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> Will declaring them on slot sharing groups not also > > > > waste > > > > > >>> > >> resources > > > > > >>> > >>>> if > > > > > >>> > >>>>>> the > > > > > >>> > >>>>>>>> parallelism of operators within that group are > > > > > different? > > > > > >>> > >>>>>>>> > > > > > >>> > >>>>>>> Yes. It's a trade-off between usability and resource > > > > > >>> > >> utilization. To > > > > > >>> > >>>>>> avoid > > > > > >>> > >>>>>>> such wasting, the user can define more groups, so > that > > > > > >>> > each group > > > > > >>> > >>>>>> contains > > > > > >>> > >>>>>>> less operators and the chance of having operators > with > > > > > >>> > different > > > > > >>> > >>>>>>> parallelism will be reduced. The price is to have > more > > > > > >>> > resource > > > > > >>> > >>>>>>> requirements to specify. > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> It also seems like quite a hassle for users having to > > > > > >>> > >> recalculate the > > > > > >>> > >>>>>>>> resource requirements if they change the slot > sharing. > > > > > >>> > >>>>>>>> I'd think that it's not really workable for users > that > > > > > create > > > > > >>> > >> a set > > > > > >>> > >>>>> of > > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in > > > > their > > > > > >>> > >>>>> applications; > > > > > >>> > >>>>>>>> managing the resources requirements in such a > setting > > > > > >>> > would be > > > > > >>> > >> a > > > > > >>> > >>>>>>>> nightmare, and in the end would require > operator-level > > > > > >>> > >> requirements > > > > > >>> > >>>>> any > > > > > >>> > >>>>>>>> way. > > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really > > > > > increases > > > > > >>> > >>>>> usability. > > > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment, > > > > > there's no > > > > > >>> > >> reason to > > > > > >>> > >>>>> put > > > > > >>> > >>>>>>> multiple operators whose individual resource > > > > > >>> > requirements are > > > > > >>> > >>>>> already > > > > > >>> > >>>>>>> known > > > > > >>> > >>>>>>> into the same group in fine-grained resource > > > > > management. > > > > > >>> > >>>>>>> - Even an operator implementation is reused for > > > > > multiple > > > > > >>> > >>>>> applications, > > > > > >>> > >>>>>>> it does not guarantee the same resource > > > > requirements. > > > > > >>> > During > > > > > >>> > >> our > > > > > >>> > >>>>> years > > > > > >>> > >>>>>>> of > > > > > >>> > >>>>>>> practices in Alibaba, with per-operator > > > > requirements > > > > > >>> > >> specified for > > > > > >>> > >>>>>>> Blink's > > > > > >>> > >>>>>>> fine-grained resource management, very few users > > > > > >>> > (including > > > > > >>> > >> our > > > > > >>> > >>>>>>> specialists > > > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are as > > > > > >>> > >> experienced as > > > > > >>> > >>>>> to > > > > > >>> > >>>>>>> accurately predict/estimate the operator resource > > > > > >>> > >> requirements. > > > > > >>> > >>>> Most > > > > > >>> > >>>>>>> people > > > > > >>> > >>>>>>> rely on the execution-time metrics (throughput, > > > > > delay, cpu > > > > > >>> > >> load, > > > > > >>> > >>>>>> memory > > > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the > > > > > specification. > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> To sum up: > > > > > >>> > >>>>>>> If the user is capable of providing proper resource > > > > > >>> > requirements > > > > > >>> > >> for > > > > > >>> > >>>>>> every > > > > > >>> > >>>>>>> operator, that's definitely a good thing and we would > > > > not > > > > > >>> > need to > > > > > >>> > >>>> rely > > > > > >>> > >>>>> on > > > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the > > > > > >>> > >> fine-grained > > > > > >>> > >>>>>> resource > > > > > >>> > >>>>>>> management to work. For those users who are capable > and > > > > > do not > > > > > >>> > >> like > > > > > >>> > >>>>>> having > > > > > >>> > >>>>>>> to set each operator to a separate SSG, I would be ok > > > > to > > > > > have > > > > > >>> > >> both > > > > > >>> > >>>>>>> SSG-based and operator-based runtime interfaces and > to > > > > > only > > > > > >>> > >> fallback > > > > > >>> > >>>> to > > > > > >>> > >>>>>> the > > > > > >>> > >>>>>>> SSG requirements when the operator requirements are > not > > > > > >>> > >> specified. > > > > > >>> > >>>>>> However, > > > > > >>> > >>>>>>> as the first step, I think we should prioritise the > use > > > > > cases > > > > > >>> > >> where > > > > > >>> > >>>>> users > > > > > >>> > >>>>>>> are not that experienced. > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> Thank you~ > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> Xintong Song > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler < > > > > > >>> > >> ches...@apache.org <mailto:ches...@apache.org>> > > > > > >>> > >>>>>>> wrote: > > > > > >>> > >>>>>>> > > > > > >>> > >>>>>>>> Will declaring them on slot sharing groups not also > > > > > waste > > > > > >>> > >> resources > > > > > >>> > >>>>> if > > > > > >>> > >>>>>>>> the parallelism of operators within that group are > > > > > different? > > > > > >>> > >>>>>>>> > > > > > >>> > >>>>>>>> It also seems like quite a hassle for users having > to > > > > > >>> > >> recalculate > > > > > >>> > >>>> the > > > > > >>> > >>>>>>>> resource requirements if they change the slot > sharing. > > > > > >>> > >>>>>>>> I'd think that it's not really workable for users > that > > > > > create > > > > > >>> > >> a set > > > > > >>> > >>>>> of > > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in > > > > their > > > > > >>> > >>>>> applications; > > > > > >>> > >>>>>>>> managing the resources requirements in such a > setting > > > > > >>> > would be > > > > > >>> > >> a > > > > > >>> > >>>>>>>> nightmare, and in the end would require > operator-level > > > > > >>> > >> requirements > > > > > >>> > >>>>> any > > > > > >>> > >>>>>>>> way. > > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really > > > > > increases > > > > > >>> > >>>>> usability. > > > > > >>> > >>>>>>>> My main worry is that it if we wire the runtime to > > > > work > > > > > >>> > on SSGs > > > > > >>> > >>>> it's > > > > > >>> > >>>>>>>> gonna be difficult to implement more fine-grained > > > > > approaches, > > > > > >>> > >> which > > > > > >>> > >>>>>>>> would not be the case if, for the runtime, they are > > > > > always > > > > > >>> > >> defined > > > > > >>> > >>>> on > > > > > >>> > >>>>>> an > > > > > >>> > >>>>>>>> operator-level. > > > > > >>> > >>>>>>>> > > > > > >>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote: > > > > > >>> > >>>>>>>>> Thanks for drafting this FLIP and starting this > > > > > discussion > > > > > >>> > >>>> Yangze. > > > > > >>> > >>>>>>>>> I like that defining resource requirements on a > slot > > > > > sharing > > > > > >>> > >>>> group > > > > > >>> > >>>>>>> makes > > > > > >>> > >>>>>>>>> the overall setup easier and improves usability of > > > > > resource > > > > > >>> > >>>>>>> requirements. > > > > > >>> > >>>>>>>>> What I do not like about it is that it changes slot > > > > > sharing > > > > > >>> > >>>> groups > > > > > >>> > >>>>>> from > > > > > >>> > >>>>>>>>> being a scheduling hint to something which needs to > > > > be > > > > > >>> > >> supported > > > > > >>> > >>>> in > > > > > >>> > >>>>>>> order > > > > > >>> > >>>>>>>>> to support fine grained resource requirements. So > > > > far, > > > > > the > > > > > >>> > >> idea > > > > > >>> > >>>> of > > > > > >>> > >>>>>> slot > > > > > >>> > >>>>>>>>> sharing groups was that it tells the system that a > > > > set > > > > > of > > > > > >>> > >>>> operators > > > > > >>> > >>>>>> can > > > > > >>> > >>>>>>>> be > > > > > >>> > >>>>>>>>> deployed in the same slot. But the system still had > > > > the > > > > > >>> > >> freedom > > > > > >>> > >>>> to > > > > > >>> > >>>>>> say > > > > > >>> > >>>>>>>> that > > > > > >>> > >>>>>>>>> it would rather place these tasks in different > slots > > > > > if it > > > > > >>> > >>>> wanted. > > > > > >>> > >>>>> If > > > > > >>> > >>>>>>> we > > > > > >>> > >>>>>>>>> now specify resource requirements on a per slot > > > > sharing > > > > > >>> > >> group, > > > > > >>> > >>>> then > > > > > >>> > >>>>>> the > > > > > >>> > >>>>>>>>> only option for a scheduler which does not support > > > > slot > > > > > >>> > >> sharing > > > > > >>> > >>>>>> groups > > > > > >>> > >>>>>>> is > > > > > >>> > >>>>>>>>> to say that every operator in this slot sharing > group > > > > > >>> > needs a > > > > > >>> > >>>> slot > > > > > >>> > >>>>>> with > > > > > >>> > >>>>>>>> the > > > > > >>> > >>>>>>>>> same resources as the whole group. > > > > > >>> > >>>>>>>>> > > > > > >>> > >>>>>>>>> So for example, if we have a job consisting of two > > > > > operator > > > > > >>> > >> op_1 > > > > > >>> > >>>>> and > > > > > >>> > >>>>>>> op_2 > > > > > >>> > >>>>>>>>> where each op needs 100 MB of memory, we would then > > > > > say that > > > > > >>> > >> the > > > > > >>> > >>>>> slot > > > > > >>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we > > > > > have a > > > > > >>> > >> cluster > > > > > >>> > >>>>>> with > > > > > >>> > >>>>>>> 2 > > > > > >>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system > > > > > cannot run > > > > > >>> > >> this > > > > > >>> > >>>>>> job. > > > > > >>> > >>>>>>> If > > > > > >>> > >>>>>>>>> the resources were specified on an operator level, > > > > > then the > > > > > >>> > >>>> system > > > > > >>> > >>>>>>> could > > > > > >>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and > > > > > op_2 to > > > > > >>> > >> TM_2. > > > > > >>> > >>>>>>>>> Originally, one of the primary goals of slot > sharing > > > > > groups > > > > > >>> > >> was > > > > > >>> > >>>> to > > > > > >>> > >>>>>> make > > > > > >>> > >>>>>>>> it > > > > > >>> > >>>>>>>>> easier for the user to reason about how many slots > a > > > > > job > > > > > >>> > >> needs > > > > > >>> > >>>>>>>> independent > > > > > >>> > >>>>>>>>> of the actual number of operators in the job. > > > > > Interestingly, > > > > > >>> > >> if > > > > > >>> > >>>> all > > > > > >>> > >>>>>>>>> operators have their resources properly specified, > > > > > then slot > > > > > >>> > >>>>> sharing > > > > > >>> > >>>>>> is > > > > > >>> > >>>>>>>> no > > > > > >>> > >>>>>>>>> longer needed because Flink could slice off the > > > > > >>> > appropriately > > > > > >>> > >>>> sized > > > > > >>> > >>>>>>> slots > > > > > >>> > >>>>>>>>> for every Task individually. What matters is > whether > > > > > the > > > > > >>> > >> whole > > > > > >>> > >>>>>> cluster > > > > > >>> > >>>>>>>> has > > > > > >>> > >>>>>>>>> enough resources to run all tasks or not. > > > > > >>> > >>>>>>>>> > > > > > >>> > >>>>>>>>> Cheers, > > > > > >>> > >>>>>>>>> Till > > > > > >>> > >>>>>>>>> > > > > > >>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo < > > > > > >>> > >> karma...@gmail.com <mailto:karma...@gmail.com>> > > > > > >>> > >>>>>> wrote: > > > > > >>> > >>>>>>>>>> Hi, there, > > > > > >>> > >>>>>>>>>> > > > > > >>> > >>>>>>>>>> We would like to start a discussion thread on > > > > > "FLIP-156: > > > > > >>> > >> Runtime > > > > > >>> > >>>>>>>>>> Interfaces for Fine-Grained Resource > > > > Requirements"[1], > > > > > >>> > >> where we > > > > > >>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime > > > > > interfaces > > > > > >>> > >> for > > > > > >>> > >>>>>>>>>> specifying fine-grained resource requirements. > > > > > >>> > >>>>>>>>>> > > > > > >>> > >>>>>>>>>> In this FLIP: > > > > > >>> > >>>>>>>>>> - Expound the user story of fine-grained resource > > > > > >>> > >> management. > > > > > >>> > >>>>>>>>>> - Propose runtime interfaces for specifying > > > > SSG-based > > > > > >>> > >> resource > > > > > >>> > >>>>>>>>>> requirements. > > > > > >>> > >>>>>>>>>> - Discuss the pros and cons of the three potential > > > > > >>> > >> granularities > > > > > >>> > >>>>> for > > > > > >>> > >>>>>>>>>> specifying the resource requirements (op, task and > > > > > slot > > > > > >>> > >> sharing > > > > > >>> > >>>>>> group) > > > > > >>> > >>>>>>>>>> and explain why we choose the slot sharing group. > > > > > >>> > >>>>>>>>>> > > > > > >>> > >>>>>>>>>> Please find more details in the FLIP wiki document > > > > > [1]. > > > > > >>> > >> Looking > > > > > >>> > >>>>>>>>>> forward to your feedback. > > > > > >>> > >>>>>>>>>> > > > > > >>> > >>>>>>>>>> [1] > > > > > >>> > >>>>>>>>>> > > > > > >>> > >> > > > > > >>> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements > > > > > >>> > < > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements > > > > > > > > > > > >>> > >>>>>>>>>> Best, > > > > > >>> > >>>>>>>>>> Yangze Guo > > > > > >>> > >>>>>>>>>> > > > > > >>> > >>>>>>>> > > > > > >>> > > > > > > >>> > > > > > > > > > > > >