Hi Xintong, Thanks for the backgrounds!
I understand the impractical of operator level specifications and the value of group level specifications. Just not that confident about “Coupling between operator chaining / slot sharing”, seems to me, it requires more knowledge than “Expose operator chaining”. Best, Kezhu Wang On Thu, Feb 4, 2021 at 13:22 Xintong Song <tonysong...@gmail.com> wrote: > Hi Kezhu, > > Maybe let me share some backgrounds first. > > - We at Alibaba have been using fine-grained resource management for > many years, with Blink (an internal version of Flink). > - We have been trying to contribute this feature to Apache Flink since > many years ago. However, we haven't succeeded, due to various reasons. > - Back to years ago, I believe there were not many users that used > Flink in production at a very large scale, thus less demand for > the feature. > - The feature on Blink is quite specific to our internal use cases > and scenarios. We have not made it general enough to cover the > community's > common use cases. > - Divergences between Flink & Blink code bases. > - Blink used operator-level resource interfaces. According to our years > of production experiences, we believe that specifying operator-level > resources are neither necessary nor easy-to-use. This is why we propose > group-level interfaces. > > Back to your questions. > > I saw the dicussion to keep slot sharing as an hint, but in reality, will > > SSG jobs expect to fail or > > run slowly if scheduler does not respect it ? A slot with 20GB memory is > > different from two 1GB > > default sized slots. So, we actually depends on scheduler > > version/implementation/de-fact if we > > claim it is an hint. > > > > SSG-based resource requirements are considered hints because the SSG itself > is a hint. There's no guarantee that operators of a SSG will always be > scheduled together. I think you have a good point that, if SSGs are not > respected, is it prefered to fail the job or to interpret the resource of > an actual slot. It's possible that we provide a configuration option and > leave that decision to the users. However, that is a design choice we need > to make when there's indeed a need for not respecting the SSGs. > > Do you mean code-path or production environment ? If it is code-path, could > > you please point out where > > the story breaks ? > > > > From the dicussion and history, could I consider FLIP-156 is an > redirection > > more than inheritance/enhancement > > of current halfly-cooked/ancient implmentation ? > > > > If you try to set the operator resources, you would find that it won't work > at the moment. There are several things not ready. > > - Interfaces for setting operator resources are never really exposed to > users. > - The resource manager never allocates slots with the requested > resources. > - Managed memory size specified for operators will not be respected, > because managed memory is shared within a slot with a different > approach. > > While the first 2 points are more related to that the feature is not yet > ready, the last point is closely related to the specifying operator level > resources. > > To sum up, we do not want to support specifying operator level in the first > step, for the following reasons. > > - It's not likely needed, due to poor usability compared to the > SSG-based approach. > - It introduces the complexity to deal with the managed memory sharing. > - It introduces the complexity to deal with combining resource > requirements from two different levels. > > > Thank you~ > > Xintong Song > > > > On Wed, Feb 3, 2021 at 7:50 PM Kezhu Wang <kez...@gmail.com> wrote: > > > Hi Till, > > > > Based on what I understood, if not wrong, the door is not closed after > SSG > > resource specifying. So, hope it could be useful in potential future > > improvement. > > > > Best, > > Kezhu Wang > > > > > > On February 3, 2021 at 18:07:21, Till Rohrmann (trohrm...@apache.org) > > wrote: > > > > Thanks for sharing your thoughts Kezhu. I like your ideas of how > > per-operator and SSG requirements can be combined. I've also thought > about > > defining a default resource profile for all tasks which have no resources > > configured. That way all operators would have resources assigned if the > > user chooses to use this feature. > > > > As Yangze and Xintong have said, we have decided to first only support > > specifying resources for SSGs as this seems more user friendly. Based on > > the feedback for this feature one potential development direction might > be > > to allow the resource specification on per-operator basis. Here we could > > pick up your ideas. > > > > Cheers, > > Till > > > > On Wed, Feb 3, 2021 at 7:31 AM Xintong Song <tonysong...@gmail.com> > wrote: > > > > > Thanks for your feedback, Kezhu. > > > > > > I think Flink *runtime* already has an ideal granularity for resource > > > > management 'task'. If there is > > > > a slot shared by multiple tasks, that slot's resource requirement is > > > simple > > > > sum of all its logical > > > > slots. So basically, this is no resource requirement for > > SlotSharingGroup > > > > in runtime until now, > > > > right ? > > > > > > That is a halfly-cooked implementation, coming from the previous > attempts > > > (years ago) trying to deliver the fine-grained resource management > > feature, > > > and never really put into use. > > > > > > From the FLIP and dicusssion, I assume that SSG resource specifying > will > > > > override operator level > > > > resource specifying if both are specified ? > > > > > > > Actually, I think we should use the finer-grained resources (i.e. > > operator > > > level) if both are specified. And more importantly, that is based on > the > > > assumption that we do need two different levels of interfaces. > > > > > > So, I wonder whether we could interpret SSG resource specifying as an > > "add" > > > > but not an "set" on > > > > resource requirement ? > > > > > > > IIUC, this is the core idea behind your proposal. I think it provides > an > > > interesting idea of how we combine operator level and SSG level > > resources, > > > *if > > > we allow configuring resources at both levels*. However, I'm not sure > > > whether the configuring resources on the operator level is indeed > needed. > > > Therefore, as a first step, this FLIP proposes to only introduce the > > > SSG-level interfaces. As listed in the future plan, we would consider > > > allowing operator level resource configuration later if we do see a > need > > > for it. At that time, we definitely should discuss what to do if > > resources > > > are configured at both levels. > > > > > > * Could SSG express negative resource requirement ? > > > > > > > No. > > > > > > Is there concrete bar for partial resource configured not function ? I > > > > saw it will fail job submission in Dispatcher.submitJob. > > > > > > > With the SSG-based approach, this should no longer be needed. The > > > constraint was introduced because we can neither properly define what > is > > > the resource of a task chained from an operator with specified resource > > and > > > another with unspecified resource, nor for a slot shared by a task with > > > specified resource and another with unspecified resource. With the > > > SSG-based approach, we no longer have those problems. > > > > > > An option(cluster/job level) to force slot sharing in scheduler ? This > > > > could be useful in case of migration from FLIP-156 to future > approach. > > > > > > > I think this is exactly what we are trying to avoid, requiring the > > > scheduler to enforce slot sharing. > > > > > > An option(cluster) to ignore resource specifying(allow resource > specified > > > > job to run on open box environment) for no production usage ? > > > > > > > That's possible. Actually, we are planning to introduce an option for > > > activating the fine-grained resource management, for development > > purposes. > > > We might consider to keep that option after the feature is completed, > to > > > allow disable the feature without having to touch the job codes. > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > > > > On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <kez...@gmail.com> wrote: > > > > > > > Hi all, sorry for join discussion even after voting started. > > > > > > > > I want to share my thoughts on this after reading above discussions. > > > > > > > > I think Flink *runtime* already has an ideal granularity for resource > > > > management 'task'. If there is > > > > a slot shared by multiple tasks, that slot's resource requirement is > > > simple > > > > sum of all its logical > > > > slots. So basically, this is no resource requirement for > > SlotSharingGroup > > > > in runtime until now, > > > > right ? > > > > > > > > As in discussion, we already agree upon that: "If all operators have > > > their > > > > resources properly > > > > specified, then slot sharing is no longer needed. " > > > > > > > > So seems to me, naturally in mind path, what we would discuss is > that: > > > how > > > > to bridge impractical > > > > operator level resource specifying to runtime task level resource > > > > requirement ? This is actually a > > > > pure api thing as Chesnay has pointed out. > > > > > > > > But FLIP-156 brings another direction on table: how about using SSG > for > > > > both api and runtime > > > > resource specifying ? > > > > > > > > From the FLIP and dicusssion, I assume that SSG resource specifying > > will > > > > override operator level > > > > resource specifying if both are specified ? > > > > > > > > So, I wonder whether we could interpret SSG resource specifying as an > > > "add" > > > > but not an "set" on > > > > resource requirement ? > > > > > > > > The semantics is that SSG resource specifying adds additional > resource > > to > > > > shared slot to express > > > > concerns on possible high thoughput and resource requirement for > tasks > > in > > > > one physical slot. > > > > > > > > The result is that if scheduler indeed respect slot sharing, > allocated > > > slot > > > > will gain extra resource > > > > specified for that SSG. > > > > > > > > I think one of coding barrier from "add" approach is > > ResourceSpec.UNKNOWN > > > > which didn't support > > > > 'merge' operation. I tend to use ResourceSpec.ZERO as default, task > > > > executor should be aware of > > > > this. > > > > > > > > @Chesnay > > > > > My main worry is that it if we wire the runtime to work on SSGs > it's > > > > > gonna be difficult to implement more fine-grained approaches, which > > > > > would not be the case if, for the runtime, they are always defined > on > > > an > > > > > operator-level. > > > > > > > > An "add" operation should be less invasive and enforce low barrier > for > > > > future find-grained > > > > approaches. > > > > > > > > @Stephan > > > > > - Users can define different slot sharing groups for operators like > > > > they > > > > > do now, with the exception that you cannot mix operators that have > a > > > > > resource profile and operators that have no resource profile. > > > > > > > > @Till > > > > > This effectively means that all unspecified operators > > > > > will implicitly have a zero resource requirement. > > > > > I am wondering whether this wouldn't lead to a surprising behaviour > > for > > > > the > > > > > user. If the user specifies the resource requirements for a single > > > > > operator, then he probably will assume that the other operators > will > > > get > > > > > the default share of resources and not nothing. > > > > > > > > I think it is inherent due to fact that we could not defining > > > > ResourceSpec.ONE, eg. resource > > > > requirement for exact one default slot, with concrete numbers ? I > tend > > to > > > > squash out unspecified one > > > > if there are operators in chaining with explicit resource specifying. > > > > Otherwise, the protocol tends > > > > to verbose as say "give me this much resource and a default". I think > > if > > > we > > > > have explict resource > > > > specifying for partial operators, it is just saying "I don't care > other > > > > operators that much, just > > > > get them places to run". It is most likely be cases there are > stateless > > > > fliter/map or other less > > > > resource consuming operators. If there is indeed a problem, I think > > > clients > > > > can specify a global > > > > default(or other level default in future). In job graph generating > > phase, > > > > we could take that default > > > > into account for unspecified operators. > > > > > > > > @FLIP-156 > > > > > Expose operator chaining. (Cons fo task level resource specifying) > > > > > > > > Is it inherent for all group level resource specifying ? They will > > either > > > > break chaining or obey it, > > > > or event could not work with. > > > > > > > > To sum up above, my suggestions are: > > > > > > > > In api side: > > > > * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if > > > > unspecified). > > > > * Operator: ResourceSpec.ZERO(unspecified) as default. > > > > * Task: sum of requirements from specified operators + global > > default(if > > > > there are any unspecified operators) > > > > * SSG: additional resource to physical slot. > > > > > > > > In runtime side: > > > > * Task: ResourceSpec.Task or ResourceSpec.ZERO > > > > * SSG: ResourceSpec.SSG or ResourceSpec.ZERO > > > > > > > > Physical slot gets sum up resources from logical slots and SSG, if it > > > gets > > > > ResourceSpec.ZERO, it is > > > > just a default sized slot. > > > > > > > > In short, turn SSG resource speciying as "add" and drop > > > > ResourceSpec.UNKNOWN. > > > > > > > > > > > > Questions/Issues: > > > > * Could SSG express negative resource requirement ? > > > > * Is there concrete bar for partial resource configured not function > ? > > I > > > > saw it will fail job submission in Dispatcher.submitJob. > > > > * An option(cluster/job level) to force slot sharing in scheduler ? > > This > > > > could be useful in case of migration from FLIP-156 to future > approach. > > > > * An option(cluster) to ignore resource specifying(allow resource > > > specified > > > > job to run on open box environment) for no production usage ? > > > > > > > > > > > > > > > > On February 1, 2021 at 11:54:10, Yangze Guo (karma...@gmail.com) > > wrote: > > > > > > > > Thanks for reply, Till and Xintong! > > > > > > > > I update the FLIP, including: > > > > - Edit the JavaDoc of the proposed > > > > StreamGraphGenerator#setSlotSharingGroupResource. > > > > - Add "Future Plan" section, which contains the potential follow-up > > > > issues and the limitations to be documented when fine-grained > resource > > > > management is exposed to users. > > > > > > > > I'll start a vote in another thread. > > > > > > > > Best, > > > > Yangze Guo > > > > > > > > On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <trohrm...@apache.org > > > > > > wrote: > > > > > > > > > > Thanks for summarizing the discussion, Yangze. I agree that setting > > > > > resource requirements per operator is not very user friendly. > > > Moreover, I > > > > > couldn't come up with a different proposal which would be as easy > to > > > use > > > > > and wouldn't expose internal scheduling details. In fact, following > > > this > > > > > argument then we shouldn't have exposed the slot sharing groups in > > the > > > > > first place. > > > > > > > > > > What is important for the user is that we properly document the > > > > limitations > > > > > and constraints the fine grained resource specification has. For > > > example, > > > > > we should explain how optimizations like chaining are affected by > it > > > and > > > > > how different execution modes (batch vs. streaming) affect the > > > execution > > > > of > > > > > operators which have specified resources. These things shouldn't > > become > > > > > part of the contract of this feature and are more caused by > internal > > > > > implementation details but it will be important to understand these > > > > things > > > > > properly in order to use this feature effectively. > > > > > > > > > > Hence, +1 for starting the vote for this FLIP. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song < > tonysong...@gmail.com> > > > > wrote: > > > > > > > > > > > Thanks for the summary, Yangze. > > > > > > > > > > > > The changes and follow-up issues LGTM. Let's wait for responses > > from > > > > the > > > > > > others before starting a vote. > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <karma...@gmail.com> > > > > wrote: > > > > > > > > > > > > > Thanks everyone for the lively discussion. I'd like to try to > > > > > > > summarize the current convergence in the discussion. Please let > > me > > > > > > > know if I got things wrong or missed something crucial here. > > > > > > > > > > > > > > Change of this FLIP: > > > > > > > - Treat the SSG resource requirements as a hint instead of a > > > > > > > restriction for the runtime. That's should be explicitly > > explained > > > in > > > > > > > the JavaDocs. > > > > > > > > > > > > > > Potential follow-up issues if needed: > > > > > > > - Provide operator-level resource configuration interface. > > > > > > > - Provide multiple options for deciding resources for SSGs > whose > > > > > > > requirement is not specified: > > > > > > > ** Default slot resource. > > > > > > > ** Default operator resource times number of operators. > > > > > > > > > > > > > > If there are no other issues, I'll update the FLIP accordingly > > and > > > > > > > start a vote thread. Thanks all for the valuable feedback > again. > > > > > > > > > > > > > > Best, > > > > > > > Yangze Guo > > > > > > > > > > > > > > Best, > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song < > > > tonysong...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > FGRuntimeInterface.png > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song < > > > > tonysong...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > >> > > > > > > > >> I think Chesnay's proposal could actually work. IIUC, the > > > keypoint > > > > is > > > > > > > to derive operator requirements from SSG requirements on the > API > > > > side, so > > > > > > > that the runtime only deals with operator requirements. It's > > > > debatable > > > > > > how > > > > > > > the deriving should be done though. E.g., an alternative could > be > > > to > > > > > > evenly > > > > > > > divide the SSG requirement into requirements of operators in > the > > > > group. > > > > > > > >> > > > > > > > >> > > > > > > > >> However, I'm not entirely sure which option is more desired. > > > > > > > Illustrating my understanding in the following figure, in which > > on > > > > the > > > > > > top > > > > > > > is Chesnay's proposal and on the bottom is the SSG-based > proposal > > > in > > > > this > > > > > > > FLIP. > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> I think the major difference between the two approaches is > > where > > > > > > > deriving operator requirements from SSG requirements happens. > > > > > > > >> > > > > > > > >> - Chesnay's proposal simplifies the runtime logic and the > > > > interface to > > > > > > > expose, at the price of moving more complexity (i.e. the > > deriving) > > > to > > > > the > > > > > > > API side. The question is, where do we prefer to keep the > > > complexity? > > > > I'm > > > > > > > slightly leaning towards having a thin API and keep the > > complexity > > > in > > > > > > > runtime if possible. > > > > > > > >> > > > > > > > >> - Notice that the dash line arrows represent optional steps > > that > > > > are > > > > > > > needed only for schedulers that do not respect SSGs, which we > > don't > > > > have > > > > > > at > > > > > > > the moment. If we only look at the solid line arrows, then the > > > > SSG-based > > > > > > > approach is much simpler, without needing to derive and > aggregate > > > the > > > > > > > requirements back and forth. I'm not sure about complicating > the > > > > current > > > > > > > design only for the potential future needs. > > > > > > > >> > > > > > > > >> > > > > > > > >> Thank you~ > > > > > > > >> > > > > > > > >> Xintong Song > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler < > > > > ches...@apache.org> > > > > > > > wrote: > > > > > > > >>> > > > > > > > >>> You're raising a good point, but I think I can rectify that > > > with > > > > a > > > > > > > minor > > > > > > > >>> adjustment. > > > > > > > >>> > > > > > > > >>> Default requirements are whatever the default requirements > > are, > > > > > > setting > > > > > > > >>> the requirements for one operator has no effect on other > > > > operators. > > > > > > > >>> > > > > > > > >>> With these rules, and some API enhancements, the following > > > mockup > > > > > > would > > > > > > > >>> replicate the SSG-based behavior: > > > > > > > >>> > > > > > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ... > > > > > > > >>> for slotSharingGroup in env.getSlotSharingGroups() { > > > > > > > >>> vertices = slotSharingGroup.getVertices() > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > vertices.first().setRequirements(requirements.get(slotSharingGroup.getID()) > > > > > > > >>> vertices.remainint().setRequirements(ZERO) > > > > > > > >>> } > > > > > > > >>> > > > > > > > >>> We could even allow setting requirements on > > slotsharing-groups > > > > > > > >>> colocation-groups and internally translate them > accordingly. > > > > > > > >>> I can't help but feel this is a plain API issue. > > > > > > > >>> > > > > > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote: > > > > > > > >>> > If I understand you correctly Chesnay, then you want to > > > > decouple > > > > > > the > > > > > > > >>> > resource requirement specification from the slot sharing > > > group > > > > > > > >>> > assignment. Hence, per default all operators would be in > > the > > > > same > > > > > > > slot > > > > > > > >>> > sharing group. If there is no operator with a resource > > > > > > specification, > > > > > > > >>> > then the system would allocate a default slot for it. If > > > there > > > > is > > > > > > at > > > > > > > >>> > least one operator, then the system would sum up all the > > > > specified > > > > > > > >>> > resources and allocate a slot of this size. This > > effectively > > > > means > > > > > > > >>> > that all unspecified operators will implicitly have a > zero > > > > resource > > > > > > > >>> > requirement. Did I understand your idea correctly? > > > > > > > >>> > > > > > > > > >>> > I am wondering whether this wouldn't lead to a surprising > > > > behaviour > > > > > > > >>> > for the user. If the user specifies the resource > > requirements > > > > for a > > > > > > > >>> > single operator, then he probably will assume that the > > other > > > > > > > operators > > > > > > > >>> > will get the default share of resources and not nothing. > > > > > > > >>> > > > > > > > > >>> > Cheers, > > > > > > > >>> > Till > > > > > > > >>> > > > > > > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler < > > > > > > ches...@apache.org > > > > > > > >>> > <mailto:ches...@apache.org>> wrote: > > > > > > > >>> > > > > > > > > >>> > Is there even a functional difference between specifying > > the > > > > > > > >>> > requirements for an SSG vs specifying the same > requirements > > > on > > > > > > a > > > > > > > >>> > single > > > > > > > >>> > operator within that group (ideally a colocation group to > > > avoid > > > > > > > this > > > > > > > >>> > whole hint business)? > > > > > > > >>> > > > > > > > > >>> > Wouldn't we get the best of both worlds in the latter > case? > > > > > > > >>> > > > > > > > > >>> > Users can take shortcuts to define shared requirements, > > > > > > > >>> > but refine them further as needed on a per-operator > basis, > > > > > > > >>> > without changing semantics of slotsharing groups > > > > > > > >>> > nor the runtime being locked into SSG-based requirements. > > > > > > > >>> > > > > > > > > >>> > (And before anyone argues what happens if slotsharing > > groups > > > > > > > >>> > change or > > > > > > > >>> > whatnot, that's a plain API issue that we could surely > > solve. > > > > > > (A > > > > > > > >>> > plain > > > > > > > >>> > iteration over slotsharing groups and therein contained > > > > > > operators > > > > > > > >>> > would > > > > > > > >>> > suffice)). > > > > > > > >>> > > > > > > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote: > > > > > > > >>> > > Maybe a different minor idea: Would it be possible to > > treat > > > > > > > the SSG > > > > > > > >>> > > resource requirements as a hint for the runtime similar > > to > > > > > > how > > > > > > > >>> > slot sharing > > > > > > > >>> > > groups are designed at the moment? Meaning that we > don't > > > give > > > > > > > >>> > the guarantee > > > > > > > >>> > > that Flink will always deploy this set of tasks > together > > no > > > > > > > >>> > matter what > > > > > > > >>> > > comes. If, for example, the runtime can derive by some > > > means > > > > > > > the > > > > > > > >>> > resource > > > > > > > >>> > > requirements for each task based on the requirements > for > > > the > > > > > > > >>> > SSG, this > > > > > > > >>> > > could be possible. One easy strategy would be to give > > every > > > > > > > task > > > > > > > >>> > the same > > > > > > > >>> > > resources as the whole slot sharing group. Another one > > > could > > > > > > be > > > > > > > >>> > > distributing the resources equally among the tasks. > This > > > does > > > > > > > >>> > not even have > > > > > > > >>> > > to be implemented but we would give ourselves the > freedom > > > to > > > > > > > change > > > > > > > >>> > > scheduling if need should arise. > > > > > > > >>> > > > > > > > > > >>> > > Cheers, > > > > > > > >>> > > Till > > > > > > > >>> > > > > > > > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo < > > > > > > karma...@gmail.com > > > > > > > >>> > <mailto:karma...@gmail.com>> wrote: > > > > > > > >>> > > > > > > > > > >>> > >> Thanks for the responses, Till and Xintong. > > > > > > > >>> > >> > > > > > > > >>> > >> I second Xintong's comment that SSG-based runtime > > > interface > > > > > > > >>> > will give > > > > > > > >>> > >> us the flexibility to achieve op/task-based approach. > > > That's > > > > > > > one of > > > > > > > >>> > >> the most important reasons for our design choice. > > > > > > > >>> > >> > > > > > > > >>> > >> Some cents regarding the default operator resource: > > > > > > > >>> > >> - It might be good for the scenario of DataStream > jobs. > > > > > > > >>> > >> ** For light-weight operators, the accumulative > > > > > > > >>> > configuration error > > > > > > > >>> > >> will not be significant. Then, the resource of a task > > used > > > > > > is > > > > > > > >>> > >> proportional to the number of operators it contains. > > > > > > > >>> > >> ** For heavy operators like join and window or > operators > > > > > > > >>> > using the > > > > > > > >>> > >> external resources, user will turn to the fine-grained > > > > > > > resource > > > > > > > >>> > >> configuration. > > > > > > > >>> > >> - It can increase the stability for the standalone > > cluster > > > > > > > >>> > where task > > > > > > > >>> > >> executors registered are heterogeneous(with different > > > > > > default > > > > > > > slot > > > > > > > >>> > >> resources). > > > > > > > >>> > >> - It might not be good for SQL users. The operators > that > > > SQL > > > > > > > >>> > will be > > > > > > > >>> > >> transferred to is a black box to the user. We also do > > not > > > > > > > guarantee > > > > > > > >>> > >> the cross-version of consistency of the transformation > > so > > > > > > far. > > > > > > > >>> > >> > > > > > > > >>> > >> I think it can be treated as a follow-up work when the > > > > > > > fine-grained > > > > > > > >>> > >> resource management is end-to-end ready. > > > > > > > >>> > >> > > > > > > > >>> > >> Best, > > > > > > > >>> > >> Yangze Guo > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song > > > > > > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>> > > > > > > > >>> > >> wrote: > > > > > > > >>> > >>> Thanks for the feedback, Till. > > > > > > > >>> > >>> > > > > > > > >>> > >>> ## I feel that what you proposed (operator-based + > > > default > > > > > > > >>> > value) might > > > > > > > >>> > >> be > > > > > > > >>> > >>> subsumed by the SSG-based approach. > > > > > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4 > > > cases, > > > > > > > >>> > categorized by > > > > > > > >>> > >>> whether the resource requirements are known to the > > users. > > > > > > > >>> > >>> > > > > > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no > > > > > > > >>> > reason to put > > > > > > > >>> > >>> multiple operators whose individual resource > > > > > > requirements > > > > > > > >>> > are already > > > > > > > >>> > >> known > > > > > > > >>> > >>> into the same group in fine-grained resource > > > > > > management. > > > > > > > >>> > And if op_1 > > > > > > > >>> > >> and > > > > > > > >>> > >>> op_2 are in different groups, there should be no > > > > > > problem > > > > > > > >>> > switching > > > > > > > >>> > >> data > > > > > > > >>> > >>> exchange mode from pipelined to blocking. This is > > > > > > > >>> > equivalent to > > > > > > > >>> > >> specifying > > > > > > > >>> > >>> operator resource requirements in your proposal. > > > > > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except > > > > > > that > > > > > > > >>> > op_2 is in a > > > > > > > >>> > >>> SSG whose resource is not specified thus would have > the > > > > > > > >>> > default slot > > > > > > > >>> > >>> resource. This is equivalent to having default > operator > > > > > > > >>> > resources in > > > > > > > >>> > >> your > > > > > > > >>> > >>> proposal. > > > > > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and > > > > > > op_2 > > > > > > > >>> > to the same > > > > > > > >>> > >> SSG > > > > > > > >>> > >>> or separate SSGs. > > > > > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be > > > > > > > >>> > equivalent to > > > > > > > >>> > >> the > > > > > > > >>> > >>> coarse-grained resource management, where op_1 and > > > > > > > op_2 > > > > > > > >>> > share a > > > > > > > >>> > >> default > > > > > > > >>> > >>> size slot no matter which data exchange mode is > > > > > > used. > > > > > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each > > > > > > of > > > > > > > >>> > them will > > > > > > > >>> > >> use > > > > > > > >>> > >>> a default size slot. This is equivalent to setting > > > > > > > them > > > > > > > >>> > with > > > > > > > >>> > >> default > > > > > > > >>> > >>> operator resources in your proposal. > > > > > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and > op_2 > > > > > > > is > > > > > > > >>> > known.* > > > > > > > >>> > >>> - It is possible that the user learns the total / > > > > > > max > > > > > > > >>> > resource > > > > > > > >>> > >>> requirement from executing and monitoring the job, > > > > > > > >>> > while not > > > > > > > >>> > >>> being aware of > > > > > > > >>> > >>> individual operator requirements. > > > > > > > >>> > >>> - I believe this is the case your proposal does not > > > > > > > >>> > cover. And TBH, > > > > > > > >>> > >>> this is probably how most users learn the resource > > > > > > > >>> > requirements, > > > > > > > >>> > >>> according > > > > > > > >>> > >>> to my experiences. > > > > > > > >>> > >>> - In this case, the user might need to specify > > > > > > > >>> > different resources > > > > > > > >>> > >> if > > > > > > > >>> > >>> he wants to switch the execution mode, which should > > > > > > > not > > > > > > > >>> > be worse > > > > > > > >>> > >> than not > > > > > > > >>> > >>> being able to use fine-grained resource management. > > > > > > > >>> > >>> > > > > > > > >>> > >>> > > > > > > > >>> > >>> ## An additional idea inspired by your proposal. > > > > > > > >>> > >>> We may provide multiple options for deciding > resources > > > for > > > > > > > >>> > SSGs whose > > > > > > > >>> > >>> requirement is not specified, if needed. > > > > > > > >>> > >>> > > > > > > > >>> > >>> - Default slot resource (current design) > > > > > > > >>> > >>> - Default operator resource times number of operators > > > > > > > >>> > (equivalent to > > > > > > > >>> > >>> your proposal) > > > > > > > >>> > >>> > > > > > > > >>> > >>> > > > > > > > >>> > >>> ## Exposing internal runtime strategies > > > > > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource > > > > > > > >>> > requirements might be > > > > > > > >>> > >>> affected if how SSGs are internally handled changes > in > > > > > > > future. > > > > > > > >>> > >> Practically, > > > > > > > >>> > >>> I do not concretely see at the moment what kind of > > > changes > > > > > > we > > > > > > > >>> > may want in > > > > > > > >>> > >>> future that might conflict with this FLIP proposal, > as > > > the > > > > > > > >>> > question of > > > > > > > >>> > >>> switching data exchange mode answered above. I'd > > suggest > > > to > > > > > > > >>> > not give up > > > > > > > >>> > >> the > > > > > > > >>> > >>> user friendliness we may gain now for the future > > problems > > > > > > > that > > > > > > > >>> > may or may > > > > > > > >>> > >>> not exist. > > > > > > > >>> > >>> > > > > > > > >>> > >>> Moreover, the SSG-based approach has the flexibility > to > > > > > > > >>> > achieve the > > > > > > > >>> > >>> equivalent behavior as the operator-based approach, > if > > we > > > > > > > set each > > > > > > > >>> > >> operator > > > > > > > >>> > >>> (or task) to a separate SSG. We can even provide a > > > shortcut > > > > > > > >>> > option to > > > > > > > >>> > >>> automatically do that for users, if needed. > > > > > > > >>> > >>> > > > > > > > >>> > >>> > > > > > > > >>> > >>> Thank you~ > > > > > > > >>> > >>> > > > > > > > >>> > >>> Xintong Song > > > > > > > >>> > >>> > > > > > > > >>> > >>> > > > > > > > >>> > >>> > > > > > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann > > > > > > > >>> > <trohrm...@apache.org <mailto:trohrm...@apache.org>> > > > > > > > >>> > >> wrote: > > > > > > > >>> > >>>> Thanks for the responses Xintong and Stephan, > > > > > > > >>> > >>>> > > > > > > > >>> > >>>> I agree that being able to define the resource > > > > > > requirements > > > > > > > for a > > > > > > > >>> > >> group of > > > > > > > >>> > >>>> operators is more user friendly. However, my concern > > is > > > > > > that > > > > > > > >>> > we are > > > > > > > >>> > >>>> exposing thereby internal runtime strategies which > > might > > > > > > > >>> > limit our > > > > > > > >>> > >>>> flexibility to execute a given job. Moreover, the > > > > > > semantics > > > > > > > of > > > > > > > >>> > >> configuring > > > > > > > >>> > >>>> resource requirements for SSGs could break if > > switching > > > > > > from > > > > > > > >>> > streaming > > > > > > > >>> > >> to > > > > > > > >>> > >>>> batch execution. If one defines the resource > > > requirements > > > > > > > for > > > > > > > >>> > op_1 -> > > > > > > > >>> > >> op_2 > > > > > > > >>> > >>>> which run in pipelined mode when using the streaming > > > > > > > >>> > execution, then > > > > > > > >>> > >> how do > > > > > > > >>> > >>>> we interpret these requirements when op_1 -> op_2 > are > > > > > > > >>> > executed with a > > > > > > > >>> > >>>> blocking data exchange in batch execution mode? > > > > > > > Consequently, > > > > > > > >>> > I am > > > > > > > >>> > >> still > > > > > > > >>> > >>>> leaning towards Stephan's proposal to set the > resource > > > > > > > >>> > requirements per > > > > > > > >>> > >>>> operator. > > > > > > > >>> > >>>> > > > > > > > >>> > >>>> Maybe the following proposal makes the configuration > > > > > > easier: > > > > > > > >>> > If the > > > > > > > >>> > >> user > > > > > > > >>> > >>>> wants to use fine-grained resource requirements, > then > > > she > > > > > > > >>> > needs to > > > > > > > >>> > >> specify > > > > > > > >>> > >>>> the default size which is used for operators which > > have > > > no > > > > > > > >>> > explicit > > > > > > > >>> > >>>> resource annotation. If this holds true, then every > > > > > > operator > > > > > > > >>> > would > > > > > > > >>> > >> have a > > > > > > > >>> > >>>> resource requirement and the system can try to > execute > > > the > > > > > > > >>> > operators > > > > > > > >>> > >> in the > > > > > > > >>> > >>>> best possible manner w/o being constrained by how > the > > > user > > > > > > > >>> > set the SSG > > > > > > > >>> > >>>> requirements. > > > > > > > >>> > >>>> > > > > > > > >>> > >>>> Cheers, > > > > > > > >>> > >>>> Till > > > > > > > >>> > >>>> > > > > > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song > > > > > > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>> > > > > > > > >>> > >>>> wrote: > > > > > > > >>> > >>>> > > > > > > > >>> > >>>>> Thanks for the feedback, Stephan. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> Actually, your proposal has also come to my mind at > > > some > > > > > > > >>> > point. And I > > > > > > > >>> > >>>> have > > > > > > > >>> > >>>>> some concerns about it. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> 1. It does not give users the same control as the > > > > > > SSG-based > > > > > > > >>> > approach. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> While both approaches do not require specifying for > > > each > > > > > > > >>> > operator, > > > > > > > >>> > >>>>> SSG-based approach supports the semantic that "some > > > > > > > operators > > > > > > > >>> > >> together > > > > > > > >>> > >>>> use > > > > > > > >>> > >>>>> this much resource" while the operator-based > approach > > > > > > > doesn't. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> Think of a long pipeline with m operators (o_1, > o_2, > > > ..., > > > > > > > >>> > o_m), and > > > > > > > >>> > >> at > > > > > > > >>> > >>>> some > > > > > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which > > > significantly > > > > > > > >>> > reduces the > > > > > > > >>> > >> data > > > > > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups > > > SSG_1 > > > > > > > >>> > (o_1, ..., > > > > > > > >>> > >> o_n) > > > > > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring > much > > > > > > higher > > > > > > > >>> > >> parallelisms > > > > > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2 > > > won't > > > > > > > >>> > lead to too > > > > > > > >>> > >> much > > > > > > > >>> > >>>>> wasting of resources. If the two SSGs end up > needing > > > > > > > different > > > > > > > >>> > >> resources, > > > > > > > >>> > >>>>> with the SSG-based approach one can directly > specify > > > > > > > >>> > resources for > > > > > > > >>> > >> the > > > > > > > >>> > >>>> two > > > > > > > >>> > >>>>> groups. However, with the operator-based approach, > > the > > > > > > > user will > > > > > > > >>> > >> have to > > > > > > > >>> > >>>>> specify resources for each operator in one of the > two > > > > > > > >>> > groups, and > > > > > > > >>> > >> tune > > > > > > > >>> > >>>> the > > > > > > > >>> > >>>>> default slot resource via configurations to fit the > > > other > > > > > > > group. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> 2. It increases the chance of breaking operator > > chains. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> Setting chainnable operators into different slot > > > sharing > > > > > > > >>> > groups will > > > > > > > >>> > >>>>> prevent them from being chained. In the current > > > > > > > implementation, > > > > > > > >>> > >>>> downstream > > > > > > > >>> > >>>>> operators, if SSG not explicitly specified, will be > > set > > > > > > to > > > > > > > >>> > the same > > > > > > > >>> > >> group > > > > > > > >>> > >>>>> as the chainable upstream operators (unless > multiple > > > > > > > upstream > > > > > > > >>> > >> operators > > > > > > > >>> > >>>> in > > > > > > > >>> > >>>>> different groups), to reduce the chance of breaking > > > > > > chains. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 > -> > > > o_3, > > > > > > > >>> > deciding > > > > > > > >>> > >> SSGs > > > > > > > >>> > >>>>> based on whether resource is specified we will > easily > > > get > > > > > > > >>> > groups like > > > > > > > >>> > >>>> (o_1, > > > > > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can > be > > > > > > > >>> > chained. This > > > > > > > >>> > >> is > > > > > > > >>> > >>>> also > > > > > > > >>> > >>>>> possible for the SSG-based approach, but I believe > > the > > > > > > > >>> > chance is much > > > > > > > >>> > >>>>> smaller because there's no strong reason for users > to > > > > > > > >>> > specify the > > > > > > > >>> > >> groups > > > > > > > >>> > >>>>> with alternate operators like that. We are more > > likely > > > to > > > > > > > >>> > get groups > > > > > > > >>> > >> like > > > > > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks > only > > > > > > > between > > > > > > > >>> > o_2 and > > > > > > > >>> > >> o_3. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> 3. It complicates the system by having two > different > > > > > > > >>> > mechanisms for > > > > > > > >>> > >>>> sharing > > > > > > > >>> > >>>>> managed memory in a slot. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed > > > > > > memory > > > > > > > >>> > sharing > > > > > > > >>> > >>>>> mechanism, where managed memory is first > distributed > > > > > > > >>> > according to the > > > > > > > >>> > >>>>> consumer type, then further distributed across > > > operators > > > > > > > of that > > > > > > > >>> > >> consumer > > > > > > > >>> > >>>>> type. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> - With the operator-based approach, managed memory > > size > > > > > > > >>> > specified > > > > > > > >>> > >> for an > > > > > > > >>> > >>>>> operator should account for all the consumer types > of > > > > > > that > > > > > > > >>> > operator. > > > > > > > >>> > >> That > > > > > > > >>> > >>>>> means the managed memory is first distributed > across > > > > > > > >>> > operators, then > > > > > > > >>> > >>>>> distributed to different consumer types of each > > > operator. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> Unfortunately, the different order of the two > > > calculation > > > > > > > >>> > steps can > > > > > > > >>> > >> lead > > > > > > > >>> > >>>> to > > > > > > > >>> > >>>>> different results. To be specific, the semantic of > > the > > > > > > > >>> > configuration > > > > > > > >>> > >>>> option > > > > > > > >>> > >>>>> `consumer-weights` changed (within a slot vs. > within > > an > > > > > > > >>> > operator). > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> To sum up things: > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> While (3) might be a bit more implementation > related, > > I > > > > > > > >>> > think (1) > > > > > > > >>> > >> and (2) > > > > > > > >>> > >>>>> somehow suggest that, the price for the proposed > > > approach > > > > > > > to > > > > > > > >>> > avoid > > > > > > > >>> > >>>>> specifying resource for every operator is that it's > > not > > > > > > as > > > > > > > >>> > >> independent > > > > > > > >>> > >>>> from > > > > > > > >>> > >>>>> operator chaining and slot sharing as the > > > operator-based > > > > > > > >>> > approach > > > > > > > >>> > >>>> discussed > > > > > > > >>> > >>>>> in the FLIP. > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> Thank you~ > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> Xintong Song > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> > > > > > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen > > > > > > > >>> > <se...@apache.org <mailto:se...@apache.org>> > > > > > > > >>> > >> wrote: > > > > > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP. > > > > > > > >>> > >>>>>> > > > > > > > >>> > >>>>>> I want to say, first of all, that this is super > well > > > > > > > >>> > written. And > > > > > > > >>> > >> the > > > > > > > >>> > >>>>>> points that the FLIP makes about how to expose the > > > > > > > >>> > configuration to > > > > > > > >>> > >>>> users > > > > > > > >>> > >>>>>> is exactly the right thing to figure out first. > > > > > > > >>> > >>>>>> So good job here! > > > > > > > >>> > >>>>>> > > > > > > > >>> > >>>>>> About how to let users specify the resource > > profiles. > > > > > > If I > > > > > > > >>> > can sum > > > > > > > >>> > >> the > > > > > > > >>> > >>>>> FLIP > > > > > > > >>> > >>>>>> and previous discussion up in my own words, the > > > problem > > > > > > > is the > > > > > > > >>> > >>>> following: > > > > > > > >>> > >>>>>> Operator-level specification is the simplest and > > > > > > cleanest > > > > > > > >>> > approach, > > > > > > > >>> > >>>>> because > > > > > > > >>> > >>>>>>> it avoids mixing operator configuration > (resource) > > > and > > > > > > > >>> > >> scheduling. No > > > > > > > >>> > >>>>>>> matter what other parameters change (chaining, > slot > > > > > > > sharing, > > > > > > > >>> > >>>> switching > > > > > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource > > > profiles > > > > > > > >>> > stay the > > > > > > > >>> > >>>> same. > > > > > > > >>> > >>>>>>> But it would require that a user specifies > > resources > > > on > > > > > > > all > > > > > > > >>> > >>>> operators, > > > > > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP > > > > > > suggests > > > > > > > going > > > > > > > >>> > >> with > > > > > > > >>> > >>>>>>> specifying resources on a Sharing-Group. > > > > > > > >>> > >>>>>> > > > > > > > >>> > >>>>>> I think both thoughts are important, so can we > find > > a > > > > > > > solution > > > > > > > >>> > >> where > > > > > > > >>> > >>>> the > > > > > > > >>> > >>>>>> Resource Profiles are specified on an Operator, > but > > we > > > > > > > >>> > still avoid > > > > > > > >>> > >> that > > > > > > > >>> > >>>>> we > > > > > > > >>> > >>>>>> need to specify a resource profile on every > > operator? > > > > > > > >>> > >>>>>> > > > > > > > >>> > >>>>>> What do you think about something like the > > following: > > > > > > > >>> > >>>>>> - Resource Profiles are specified on an operator > > > > > > level. > > > > > > > >>> > >>>>>> - Not all operators need profiles > > > > > > > >>> > >>>>>> - All Operators without a Resource Profile ended > up > > > > > > in > > > > > > > the > > > > > > > >>> > >> default > > > > > > > >>> > >>>> slot > > > > > > > >>> > >>>>>> sharing group with a default profile (will get a > > > default > > > > > > > slot). > > > > > > > >>> > >>>>>> - All Operators with a Resource Profile will go > into > > > > > > > >>> > another slot > > > > > > > >>> > >>>>> sharing > > > > > > > >>> > >>>>>> group (the resource-specified-group). > > > > > > > >>> > >>>>>> - Users can define different slot sharing groups > for > > > > > > > >>> > operators > > > > > > > >>> > >> like > > > > > > > >>> > >>>>> they > > > > > > > >>> > >>>>>> do now, with the exception that you cannot mix > > > operators > > > > > > > >>> > that have > > > > > > > >>> > >> a > > > > > > > >>> > >>>>>> resource profile and operators that have no > resource > > > > > > > profile. > > > > > > > >>> > >>>>>> - The default case where no operator has a > resource > > > > > > > >>> > profile is > > > > > > > >>> > >> just a > > > > > > > >>> > >>>>>> special case of this model > > > > > > > >>> > >>>>>> - The chaining logic sums up the profiles per > > > > > > operator, > > > > > > > >>> > like it > > > > > > > >>> > >> does > > > > > > > >>> > >>>>> now, > > > > > > > >>> > >>>>>> and the scheduler sums up the profiles of the > tasks > > > that > > > > > > > it > > > > > > > >>> > >> schedules > > > > > > > >>> > >>>>>> together. > > > > > > > >>> > >>>>>> > > > > > > > >>> > >>>>>> > > > > > > > >>> > >>>>>> There is another question about reactive scaling > > > raised > > > > > > > in the > > > > > > > >>> > >> FLIP. I > > > > > > > >>> > >>>>> need > > > > > > > >>> > >>>>>> to think a bit about that. That is indeed a bit > more > > > > > > > tricky > > > > > > > >>> > once we > > > > > > > >>> > >>>> have > > > > > > > >>> > >>>>>> slots of different sizes. > > > > > > > >>> > >>>>>> It is not clear then which of the different slot > > > > > > requests > > > > > > > the > > > > > > > >>> > >>>>>> ResourceManager should fulfill when new resources > > > (TMs) > > > > > > > >>> > show up, > > > > > > > >>> > >> or how > > > > > > > >>> > >>>>> the > > > > > > > >>> > >>>>>> JobManager redistributes the slots resources when > > > > > > > resources > > > > > > > >>> > (TMs) > > > > > > > >>> > >>>>> disappear > > > > > > > >>> > >>>>>> This question is pretty orthogonal, though, to the > > > "how > > > > > > to > > > > > > > >>> > specify > > > > > > > >>> > >> the > > > > > > > >>> > >>>>>> resources". > > > > > > > >>> > >>>>>> > > > > > > > >>> > >>>>>> > > > > > > > >>> > >>>>>> Best, > > > > > > > >>> > >>>>>> Stephan > > > > > > > >>> > >>>>>> > > > > > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song > > > > > > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com> > > > > > > > >>> > >>>>> wrote: > > > > > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the > > > > > > discussion, > > > > > > > >>> > Yangze. > > > > > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay. > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> @Till, > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> I agree that specifying requirements for SSGs > means > > > > > > that > > > > > > > SSGs > > > > > > > >>> > >> need to > > > > > > > >>> > >>>>> be > > > > > > > >>> > >>>>>>> supported in fine-grained resource management, > > > > > > otherwise > > > > > > > each > > > > > > > >>> > >>>> operator > > > > > > > >>> > >>>>>>> might use as many resources as the whole group. > > > > > > However, > > > > > > > I > > > > > > > >>> > cannot > > > > > > > >>> > >>>> think > > > > > > > >>> > >>>>>> of > > > > > > > >>> > >>>>>>> a strong reason for not supporting SSGs in > > > fine-grained > > > > > > > >>> > resource > > > > > > > >>> > >>>>>>> management. > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>>> Interestingly, if all operators have their > > resources > > > > > > > properly > > > > > > > >>> > >>>>>> specified, > > > > > > > >>> > >>>>>>>> then slot sharing is no longer needed because > > Flink > > > > > > > could > > > > > > > >>> > >> slice off > > > > > > > >>> > >>>>> the > > > > > > > >>> > >>>>>>>> appropriately sized slots for every Task > > > individually. > > > > > > > >>> > >>>>>>>> > > > > > > > >>> > >>>>>>> So for example, if we have a job consisting of > two > > > > > > > >>> > operator op_1 > > > > > > > >>> > >> and > > > > > > > >>> > >>>>> op_2 > > > > > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would > > then > > > > > > say > > > > > > > that > > > > > > > >>> > >> the > > > > > > > >>> > >>>> slot > > > > > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If > we > > > > > > have > > > > > > > a > > > > > > > >>> > >> cluster > > > > > > > >>> > >>>>> with > > > > > > > >>> > >>>>>> 2 > > > > > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the > system > > > > > > > cannot run > > > > > > > >>> > >> this > > > > > > > >>> > >>>>> job. > > > > > > > >>> > >>>>>> If > > > > > > > >>> > >>>>>>>> the resources were specified on an operator > level, > > > > > > then > > > > > > > the > > > > > > > >>> > >> system > > > > > > > >>> > >>>>>> could > > > > > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 > and > > > > > > op_2 > > > > > > > to > > > > > > > >>> > >> TM_2. > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> Couldn't agree more that if all operators' > > > requirements > > > > > > > are > > > > > > > >>> > >> properly > > > > > > > >>> > >>>>>>> specified, slot sharing should be no longer > needed. > > I > > > > > > > >>> > think this > > > > > > > >>> > >>>>> exactly > > > > > > > >>> > >>>>>>> disproves the example. If we already know op_1 > and > > > op_2 > > > > > > > each > > > > > > > >>> > >> needs > > > > > > > >>> > >>>> 100 > > > > > > > >>> > >>>>> MB > > > > > > > >>> > >>>>>>> of memory, why would we put them in the same > group? > > > If > > > > > > > >>> > they are > > > > > > > >>> > >> in > > > > > > > >>> > >>>>>> separate > > > > > > > >>> > >>>>>>> groups, with the proposed approach the system can > > > > > > freely > > > > > > > >>> > deploy > > > > > > > >>> > >> them > > > > > > > >>> > >>>> to > > > > > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs. > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> Moreover, the precondition for not needing slot > > > sharing > > > > > > > is > > > > > > > >>> > having > > > > > > > >>> > >>>>>> resource > > > > > > > >>> > >>>>>>> requirements properly specified for all > operators. > > > This > > > > > > > is not > > > > > > > >>> > >> always > > > > > > > >>> > >>>>>>> possible, and usually requires tremendous > efforts. > > > One > > > > > > > of the > > > > > > > >>> > >>>> benefits > > > > > > > >>> > >>>>>> for > > > > > > > >>> > >>>>>>> SSG-based requirements is that it allows the user > > to > > > > > > > freely > > > > > > > >>> > >> decide > > > > > > > >>> > >>>> the > > > > > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I > would > > > > > > > >>> > consider SSG > > > > > > > >>> > >> in > > > > > > > >>> > >>>>>>> fine-grained resource management as a group of > > > > > > operators > > > > > > > >>> > that the > > > > > > > >>> > >>>> user > > > > > > > >>> > >>>>>>> would like to specify the total resource for. > There > > > can > > > > > > > be > > > > > > > >>> > only > > > > > > > >>> > >> one > > > > > > > >>> > >>>>> group > > > > > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a > few > > > > > > major > > > > > > > >>> > parts, > > > > > > > >>> > >> or as > > > > > > > >>> > >>>>>> many > > > > > > > >>> > >>>>>>> groups as the number of tasks/operators, > depending > > on > > > > > > how > > > > > > > >>> > >>>> fine-grained > > > > > > > >>> > >>>>>> the > > > > > > > >>> > >>>>>>> user is able to specify the resources. > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But > > > given > > > > > > > >>> > that all > > > > > > > >>> > >> the > > > > > > > >>> > >>>>>>> current scheduler implementations already support > > > > > > SSGs, I > > > > > > > >>> > tend to > > > > > > > >>> > >>>> think > > > > > > > >>> > >>>>>>> that as an acceptable price for the above > discussed > > > > > > > >>> > usability and > > > > > > > >>> > >>>>>>> flexibility. > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> @Chesnay > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> Will declaring them on slot sharing groups not > also > > > > > > waste > > > > > > > >>> > >> resources > > > > > > > >>> > >>>> if > > > > > > > >>> > >>>>>> the > > > > > > > >>> > >>>>>>>> parallelism of operators within that group are > > > > > > > different? > > > > > > > >>> > >>>>>>>> > > > > > > > >>> > >>>>>>> Yes. It's a trade-off between usability and > > resource > > > > > > > >>> > >> utilization. To > > > > > > > >>> > >>>>>> avoid > > > > > > > >>> > >>>>>>> such wasting, the user can define more groups, so > > > that > > > > > > > >>> > each group > > > > > > > >>> > >>>>>> contains > > > > > > > >>> > >>>>>>> less operators and the chance of having operators > > > with > > > > > > > >>> > different > > > > > > > >>> > >>>>>>> parallelism will be reduced. The price is to have > > > more > > > > > > > >>> > resource > > > > > > > >>> > >>>>>>> requirements to specify. > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> It also seems like quite a hassle for users > having > > to > > > > > > > >>> > >> recalculate the > > > > > > > >>> > >>>>>>>> resource requirements if they change the slot > > > sharing. > > > > > > > >>> > >>>>>>>> I'd think that it's not really workable for > users > > > that > > > > > > > create > > > > > > > >>> > >> a set > > > > > > > >>> > >>>>> of > > > > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched > in > > > > > > their > > > > > > > >>> > >>>>> applications; > > > > > > > >>> > >>>>>>>> managing the resources requirements in such a > > > setting > > > > > > > >>> > would be > > > > > > > >>> > >> a > > > > > > > >>> > >>>>>>>> nightmare, and in the end would require > > > operator-level > > > > > > > >>> > >> requirements > > > > > > > >>> > >>>>> any > > > > > > > >>> > >>>>>>>> way. > > > > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it > really > > > > > > > increases > > > > > > > >>> > >>>>> usability. > > > > > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment, > > > > > > > there's no > > > > > > > >>> > >> reason to > > > > > > > >>> > >>>>> put > > > > > > > >>> > >>>>>>> multiple operators whose individual resource > > > > > > > >>> > requirements are > > > > > > > >>> > >>>>> already > > > > > > > >>> > >>>>>>> known > > > > > > > >>> > >>>>>>> into the same group in fine-grained resource > > > > > > > management. > > > > > > > >>> > >>>>>>> - Even an operator implementation is reused for > > > > > > > multiple > > > > > > > >>> > >>>>> applications, > > > > > > > >>> > >>>>>>> it does not guarantee the same resource > > > > > > requirements. > > > > > > > >>> > During > > > > > > > >>> > >> our > > > > > > > >>> > >>>>> years > > > > > > > >>> > >>>>>>> of > > > > > > > >>> > >>>>>>> practices in Alibaba, with per-operator > > > > > > requirements > > > > > > > >>> > >> specified for > > > > > > > >>> > >>>>>>> Blink's > > > > > > > >>> > >>>>>>> fine-grained resource management, very few users > > > > > > > >>> > (including > > > > > > > >>> > >> our > > > > > > > >>> > >>>>>>> specialists > > > > > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are > as > > > > > > > >>> > >> experienced as > > > > > > > >>> > >>>>> to > > > > > > > >>> > >>>>>>> accurately predict/estimate the operator resource > > > > > > > >>> > >> requirements. > > > > > > > >>> > >>>> Most > > > > > > > >>> > >>>>>>> people > > > > > > > >>> > >>>>>>> rely on the execution-time metrics (throughput, > > > > > > > delay, cpu > > > > > > > >>> > >> load, > > > > > > > >>> > >>>>>> memory > > > > > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the > > > > > > > specification. > > > > > > > >>> > >>>>>>> > > > > > > > >>> > >>>>>>> To sum up: > > > > > > > >>> > >>>>>>> If the user is capable of providing proper > resource > > > > > > > >>> > requirements > > > > > > > >>> > >> for > > > > > > > >>> > >>>>>> every > > > > > > > >>> > >>>>>>> operator, that's definitely a good thing and we > > would > > > > > > not > > > > > > > >>> > need to > > > > > > > >>> > >>>> rely > > > > > > > >>> > >>>>> on > > > > > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for > > the > > > > > > > >>> > >> fine-grained > > > > > > > >>> > >>>>>> resource > > > > > > > >>> > >>>>>>> management to work. For those users who are > capable > > > and > > > > > > > do not > > > > > > > >>> > >> like > > > > > > > >>> > >>>>>> having > > > > > > > >>> > >>>>>>> to set each operator to a separate SSG, I would > be > > ok > > > > > > to > > > > > > > have > > > > > > > >>> > >> both > > > > > > > >>> > >>>>>>> SSG-based and operator-based runtime