Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Kezhu Wang Wed, 03 Feb 2021 22:22:37 -0800

Hi Xintong,

Thanks for the backgrounds!


I understand the impractical of operator level specifications and the value
of group level specifications. Just not that confident about “Coupling
between operator chaining / slot sharing”, seems to me, it requires more
knowledge than “Expose operator chaining”.

Best,
Kezhu Wang

On Thu, Feb 4, 2021 at 13:22 Xintong Song <tonysong...@gmail.com> wrote:

> Hi Kezhu,
>
> Maybe let me share some backgrounds first.
>
>    - We at Alibaba have been using fine-grained resource management for
>    many years, with Blink (an internal version of Flink).
>    - We have been trying to contribute this feature to Apache Flink since
>    many years ago. However, we haven't succeeded, due to various reasons.
>       - Back to years ago, I believe there were not many users that used
>       Flink in production at a very large scale, thus less demand for
> the feature.
>       - The feature on Blink is quite specific to our internal use cases
>       and scenarios. We have not made it general enough to cover the
> community's
>       common use cases.
>       - Divergences between Flink & Blink code bases.
>    - Blink used operator-level resource interfaces. According to our years
>    of production experiences, we believe that specifying operator-level
>    resources are neither necessary nor easy-to-use. This is why we propose
>    group-level interfaces.
>
> Back to your questions.
>
> I saw the dicussion to keep slot sharing as an hint, but in reality, will
> > SSG jobs expect to fail or
> > run slowly if scheduler does not respect it ? A slot with 20GB memory is
> > different from two 1GB
> > default sized slots. So, we actually depends on scheduler
> > version/implementation/de-fact if we
> > claim it is an hint.
> >
>
> SSG-based resource requirements are considered hints because the SSG itself
> is a hint. There's no guarantee that operators of a SSG will always be
> scheduled together. I think you have a good point that, if SSGs are not
> respected, is it prefered to fail the job or to interpret the resource of
> an actual slot. It's possible that we provide a configuration option and
> leave that decision to the users. However, that is a design choice we need
> to make when there's indeed a need for not respecting the SSGs.
>
> Do you mean code-path or production environment ? If it is code-path, could
> > you please point out where
> > the story breaks ?
> >
> > From the dicussion and history, could I consider FLIP-156 is an
> redirection
> > more than inheritance/enhancement
> > of current halfly-cooked/ancient implmentation ?
> >
>
> If you try to set the operator resources, you would find that it won't work
> at the moment. There are several things not ready.
>
>    - Interfaces for setting operator resources are never really exposed to
>    users.
>    - The resource manager never allocates slots with the requested
>    resources.
>    - Managed memory size specified for operators will not be respected,
>    because managed memory is shared within a slot with a different
> approach.
>
> While the first 2 points are more related to that the feature is not yet
> ready, the last point is closely related to the specifying operator level
> resources.
>
> To sum up, we do not want to support specifying operator level in the first
> step, for the following reasons.
>
>    - It's not likely needed, due to poor usability compared to the
>    SSG-based approach.
>    - It introduces the complexity to deal with the managed memory sharing.
>    - It introduces the complexity to deal with combining resource
>    requirements from two different levels.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Feb 3, 2021 at 7:50 PM Kezhu Wang <kez...@gmail.com> wrote:
>
> > Hi Till,
> >
> > Based on what I understood, if not wrong, the door is not closed after
> SSG
> > resource specifying. So, hope it could be useful in potential future
> > improvement.
> >
> > Best,
> > Kezhu Wang
> >
> >
> > On February 3, 2021 at 18:07:21, Till Rohrmann (trohrm...@apache.org)
> > wrote:
> >
> > Thanks for sharing your thoughts Kezhu. I like your ideas of how
> > per-operator and SSG requirements can be combined. I've also thought
> about
> > defining a default resource profile for all tasks which have no resources
> > configured. That way all operators would have resources assigned if the
> > user chooses to use this feature.
> >
> > As Yangze and Xintong have said, we have decided to first only support
> > specifying resources for SSGs as this seems more user friendly. Based on
> > the feedback for this feature one potential development direction might
> be
> > to allow the resource specification on per-operator basis. Here we could
> > pick up your ideas.
> >
> > Cheers,
> > Till
> >
> > On Wed, Feb 3, 2021 at 7:31 AM Xintong Song <tonysong...@gmail.com>
> wrote:
> >
> > > Thanks for your feedback, Kezhu.
> > >
> > > I think Flink *runtime* already has an ideal granularity for resource
> > > > management 'task'. If there is
> > > > a slot shared by multiple tasks, that slot's resource requirement is
> > > simple
> > > > sum of all its logical
> > > > slots. So basically, this is no resource requirement for
> > SlotSharingGroup
> > > > in runtime until now,
> > > > right ?
> > >
> > > That is a halfly-cooked implementation, coming from the previous
> attempts
> > > (years ago) trying to deliver the fine-grained resource management
> > feature,
> > > and never really put into use.
> > >
> > > From the FLIP and dicusssion, I assume that SSG resource specifying
> will
> > > > override operator level
> > > > resource specifying if both are specified ?
> > > >
> > > Actually, I think we should use the finer-grained resources (i.e.
> > operator
> > > level) if both are specified. And more importantly, that is based on
> the
> > > assumption that we do need two different levels of interfaces.
> > >
> > > So, I wonder whether we could interpret SSG resource specifying as an
> > "add"
> > > > but not an "set" on
> > > > resource requirement ?
> > > >
> > > IIUC, this is the core idea behind your proposal. I think it provides
> an
> > > interesting idea of how we combine operator level and SSG level
> > resources,
> > > *if
> > > we allow configuring resources at both levels*. However, I'm not sure
> > > whether the configuring resources on the operator level is indeed
> needed.
> > > Therefore, as a first step, this FLIP proposes to only introduce the
> > > SSG-level interfaces. As listed in the future plan, we would consider
> > > allowing operator level resource configuration later if we do see a
> need
> > > for it. At that time, we definitely should discuss what to do if
> > resources
> > > are configured at both levels.
> > >
> > > * Could SSG express negative resource requirement ?
> > > >
> > > No.
> > >
> > > Is there concrete bar for partial resource configured not function ? I
> > > > saw it will fail job submission in Dispatcher.submitJob.
> > > >
> > > With the SSG-based approach, this should no longer be needed. The
> > > constraint was introduced because we can neither properly define what
> is
> > > the resource of a task chained from an operator with specified resource
> > and
> > > another with unspecified resource, nor for a slot shared by a task with
> > > specified resource and another with unspecified resource. With the
> > > SSG-based approach, we no longer have those problems.
> > >
> > > An option(cluster/job level) to force slot sharing in scheduler ? This
> > > > could be useful in case of migration from FLIP-156 to future
> approach.
> > > >
> > > I think this is exactly what we are trying to avoid, requiring the
> > > scheduler to enforce slot sharing.
> > >
> > > An option(cluster) to ignore resource specifying(allow resource
> specified
> > > > job to run on open box environment) for no production usage ?
> > > >
> > > That's possible. Actually, we are planning to introduce an option for
> > > activating the fine-grained resource management, for development
> > purposes.
> > > We might consider to keep that option after the feature is completed,
> to
> > > allow disable the feature without having to touch the job codes.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <kez...@gmail.com> wrote:
> > >
> > > > Hi all, sorry for join discussion even after voting started.
> > > >
> > > > I want to share my thoughts on this after reading above discussions.
> > > >
> > > > I think Flink *runtime* already has an ideal granularity for resource
> > > > management 'task'. If there is
> > > > a slot shared by multiple tasks, that slot's resource requirement is
> > > simple
> > > > sum of all its logical
> > > > slots. So basically, this is no resource requirement for
> > SlotSharingGroup
> > > > in runtime until now,
> > > > right ?
> > > >
> > > > As in discussion, we already agree upon that: "If all operators have
> > > their
> > > > resources properly
> > > > specified, then slot sharing is no longer needed. "
> > > >
> > > > So seems to me, naturally in mind path, what we would discuss is
> that:
> > > how
> > > > to bridge impractical
> > > > operator level resource specifying to runtime task level resource
> > > > requirement ? This is actually a
> > > > pure api thing as Chesnay has pointed out.
> > > >
> > > > But FLIP-156 brings another direction on table: how about using SSG
> for
> > > > both api and runtime
> > > > resource specifying ?
> > > >
> > > > From the FLIP and dicusssion, I assume that SSG resource specifying
> > will
> > > > override operator level
> > > > resource specifying if both are specified ?
> > > >
> > > > So, I wonder whether we could interpret SSG resource specifying as an
> > > "add"
> > > > but not an "set" on
> > > > resource requirement ?
> > > >
> > > > The semantics is that SSG resource specifying adds additional
> resource
> > to
> > > > shared slot to express
> > > > concerns on possible high thoughput and resource requirement for
> tasks
> > in
> > > > one physical slot.
> > > >
> > > > The result is that if scheduler indeed respect slot sharing,
> allocated
> > > slot
> > > > will gain extra resource
> > > > specified for that SSG.
> > > >
> > > > I think one of coding barrier from "add" approach is
> > ResourceSpec.UNKNOWN
> > > > which didn't support
> > > > 'merge' operation. I tend to use ResourceSpec.ZERO as default, task
> > > > executor should be aware of
> > > > this.
> > > >
> > > > @Chesnay
> > > > > My main worry is that it if we wire the runtime to work on SSGs
> it's
> > > > > gonna be difficult to implement more fine-grained approaches, which
> > > > > would not be the case if, for the runtime, they are always defined
> on
> > > an
> > > > > operator-level.
> > > >
> > > > An "add" operation should be less invasive and enforce low barrier
> for
> > > > future find-grained
> > > > approaches.
> > > >
> > > > @Stephan
> > > > > - Users can define different slot sharing groups for operators like
> > > > they
> > > > > do now, with the exception that you cannot mix operators that have
> a
> > > > > resource profile and operators that have no resource profile.
> > > >
> > > > @Till
> > > > > This effectively means that all unspecified operators
> > > > > will implicitly have a zero resource requirement.
> > > > > I am wondering whether this wouldn't lead to a surprising behaviour
> > for
> > > > the
> > > > > user. If the user specifies the resource requirements for a single
> > > > > operator, then he probably will assume that the other operators
> will
> > > get
> > > > > the default share of resources and not nothing.
> > > >
> > > > I think it is inherent due to fact that we could not defining
> > > > ResourceSpec.ONE, eg. resource
> > > > requirement for exact one default slot, with concrete numbers ? I
> tend
> > to
> > > > squash out unspecified one
> > > > if there are operators in chaining with explicit resource specifying.
> > > > Otherwise, the protocol tends
> > > > to verbose as say "give me this much resource and a default". I think
> > if
> > > we
> > > > have explict resource
> > > > specifying for partial operators, it is just saying "I don't care
> other
> > > > operators that much, just
> > > > get them places to run". It is most likely be cases there are
> stateless
> > > > fliter/map or other less
> > > > resource consuming operators. If there is indeed a problem, I think
> > > clients
> > > > can specify a global
> > > > default(or other level default in future). In job graph generating
> > phase,
> > > > we could take that default
> > > > into account for unspecified operators.
> > > >
> > > > @FLIP-156
> > > > > Expose operator chaining. (Cons fo task level resource specifying)
> > > >
> > > > Is it inherent for all group level resource specifying ? They will
> > either
> > > > break chaining or obey it,
> > > > or event could not work with.
> > > >
> > > > To sum up above, my suggestions are:
> > > >
> > > > In api side:
> > > > * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
> > > > unspecified).
> > > > * Operator: ResourceSpec.ZERO(unspecified) as default.
> > > > * Task: sum of requirements from specified operators + global
> > default(if
> > > > there are any unspecified operators)
> > > > * SSG: additional resource to physical slot.
> > > >
> > > > In runtime side:
> > > > * Task: ResourceSpec.Task or ResourceSpec.ZERO
> > > > * SSG: ResourceSpec.SSG or ResourceSpec.ZERO
> > > >
> > > > Physical slot gets sum up resources from logical slots and SSG, if it
> > > gets
> > > > ResourceSpec.ZERO, it is
> > > > just a default sized slot.
> > > >
> > > > In short, turn SSG resource speciying as "add" and drop
> > > > ResourceSpec.UNKNOWN.
> > > >
> > > >
> > > > Questions/Issues:
> > > > * Could SSG express negative resource requirement ?
> > > > * Is there concrete bar for partial resource configured not function
> ?
> > I
> > > > saw it will fail job submission in Dispatcher.submitJob.
> > > > * An option(cluster/job level) to force slot sharing in scheduler ?
> > This
> > > > could be useful in case of migration from FLIP-156 to future
> approach.
> > > > * An option(cluster) to ignore resource specifying(allow resource
> > > specified
> > > > job to run on open box environment) for no production usage ?
> > > >
> > > >
> > > >
> > > > On February 1, 2021 at 11:54:10, Yangze Guo (karma...@gmail.com)
> > wrote:
> > > >
> > > > Thanks for reply, Till and Xintong!
> > > >
> > > > I update the FLIP, including:
> > > > - Edit the JavaDoc of the proposed
> > > > StreamGraphGenerator#setSlotSharingGroupResource.
> > > > - Add "Future Plan" section, which contains the potential follow-up
> > > > issues and the limitations to be documented when fine-grained
> resource
> > > > management is exposed to users.
> > > >
> > > > I'll start a vote in another thread.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <trohrm...@apache.org
> >
> > > > wrote:
> > > > >
> > > > > Thanks for summarizing the discussion, Yangze. I agree that setting
> > > > > resource requirements per operator is not very user friendly.
> > > Moreover, I
> > > > > couldn't come up with a different proposal which would be as easy
> to
> > > use
> > > > > and wouldn't expose internal scheduling details. In fact, following
> > > this
> > > > > argument then we shouldn't have exposed the slot sharing groups in
> > the
> > > > > first place.
> > > > >
> > > > > What is important for the user is that we properly document the
> > > > limitations
> > > > > and constraints the fine grained resource specification has. For
> > > example,
> > > > > we should explain how optimizations like chaining are affected by
> it
> > > and
> > > > > how different execution modes (batch vs. streaming) affect the
> > > execution
> > > > of
> > > > > operators which have specified resources. These things shouldn't
> > become
> > > > > part of the contract of this feature and are more caused by
> internal
> > > > > implementation details but it will be important to understand these
> > > > things
> > > > > properly in order to use this feature effectively.
> > > > >
> > > > > Hence, +1 for starting the vote for this FLIP.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <
> tonysong...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks for the summary, Yangze.
> > > > > >
> > > > > > The changes and follow-up issues LGTM. Let's wait for responses
> > from
> > > > the
> > > > > > others before starting a vote.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <karma...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Thanks everyone for the lively discussion. I'd like to try to
> > > > > > > summarize the current convergence in the discussion. Please let
> > me
> > > > > > > know if I got things wrong or missed something crucial here.
> > > > > > >
> > > > > > > Change of this FLIP:
> > > > > > > - Treat the SSG resource requirements as a hint instead of a
> > > > > > > restriction for the runtime. That's should be explicitly
> > explained
> > > in
> > > > > > > the JavaDocs.
> > > > > > >
> > > > > > > Potential follow-up issues if needed:
> > > > > > > - Provide operator-level resource configuration interface.
> > > > > > > - Provide multiple options for deciding resources for SSGs
> whose
> > > > > > > requirement is not specified:
> > > > > > > ** Default slot resource.
> > > > > > > ** Default operator resource times number of operators.
> > > > > > >
> > > > > > > If there are no other issues, I'll update the FLIP accordingly
> > and
> > > > > > > start a vote thread. Thanks all for the valuable feedback
> again.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <
> > > tonysong...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > FGRuntimeInterface.png
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <
> > > > tonysong...@gmail.com>
> > > >
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> I think Chesnay's proposal could actually work. IIUC, the
> > > keypoint
> > > > is
> > > > > > > to derive operator requirements from SSG requirements on the
> API
> > > > side, so
> > > > > > > that the runtime only deals with operator requirements. It's
> > > > debatable
> > > > > > how
> > > > > > > the deriving should be done though. E.g., an alternative could
> be
> > > to
> > > > > > evenly
> > > > > > > divide the SSG requirement into requirements of operators in
> the
> > > > group.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> However, I'm not entirely sure which option is more desired.
> > > > > > > Illustrating my understanding in the following figure, in which
> > on
> > > > the
> > > > > > top
> > > > > > > is Chesnay's proposal and on the bottom is the SSG-based
> proposal
> > > in
> > > > this
> > > > > > > FLIP.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> I think the major difference between the two approaches is
> > where
> > > > > > > deriving operator requirements from SSG requirements happens.
> > > > > > > >>
> > > > > > > >> - Chesnay's proposal simplifies the runtime logic and the
> > > > interface to
> > > > > > > expose, at the price of moving more complexity (i.e. the
> > deriving)
> > > to
> > > > the
> > > > > > > API side. The question is, where do we prefer to keep the
> > > complexity?
> > > > I'm
> > > > > > > slightly leaning towards having a thin API and keep the
> > complexity
> > > in
> > > > > > > runtime if possible.
> > > > > > > >>
> > > > > > > >> - Notice that the dash line arrows represent optional steps
> > that
> > > > are
> > > > > > > needed only for schedulers that do not respect SSGs, which we
> > don't
> > > > have
> > > > > > at
> > > > > > > the moment. If we only look at the solid line arrows, then the
> > > > SSG-based
> > > > > > > approach is much simpler, without needing to derive and
> aggregate
> > > the
> > > > > > > requirements back and forth. I'm not sure about complicating
> the
> > > > current
> > > > > > > design only for the potential future needs.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Thank you~
> > > > > > > >>
> > > > > > > >> Xintong Song
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
> > > > ches...@apache.org>
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>> You're raising a good point, but I think I can rectify that
> > > with
> > > > a
> > > > > > > minor
> > > > > > > >>> adjustment.
> > > > > > > >>>
> > > > > > > >>> Default requirements are whatever the default requirements
> > are,
> > > > > > setting
> > > > > > > >>> the requirements for one operator has no effect on other
> > > > operators.
> > > > > > > >>>
> > > > > > > >>> With these rules, and some API enhancements, the following
> > > mockup
> > > > > > would
> > > > > > > >>> replicate the SSG-based behavior:
> > > > > > > >>>
> > > > > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > > > > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > > > > > >>> vertices = slotSharingGroup.getVertices()
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > >
> > >
> >
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > > > > > >>> vertices.remainint().setRequirements(ZERO)
> > > > > > > >>> }
> > > > > > > >>>
> > > > > > > >>> We could even allow setting requirements on
> > slotsharing-groups
> > > > > > > >>> colocation-groups and internally translate them
> accordingly.
> > > > > > > >>> I can't help but feel this is a plain API issue.
> > > > > > > >>>
> > > > > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > > > > > >>> > If I understand you correctly Chesnay, then you want to
> > > > decouple
> > > > > > the
> > > > > > > >>> > resource requirement specification from the slot sharing
> > > group
> > > > > > > >>> > assignment. Hence, per default all operators would be in
> > the
> > > > same
> > > > > > > slot
> > > > > > > >>> > sharing group. If there is no operator with a resource
> > > > > > specification,
> > > > > > > >>> > then the system would allocate a default slot for it. If
> > > there
> > > > is
> > > > > > at
> > > > > > > >>> > least one operator, then the system would sum up all the
> > > > specified
> > > > > > > >>> > resources and allocate a slot of this size. This
> > effectively
> > > > means
> > > > > > > >>> > that all unspecified operators will implicitly have a
> zero
> > > > resource
> > > > > > > >>> > requirement. Did I understand your idea correctly?
> > > > > > > >>> >
> > > > > > > >>> > I am wondering whether this wouldn't lead to a surprising
> > > > behaviour
> > > > > > > >>> > for the user. If the user specifies the resource
> > requirements
> > > > for a
> > > > > > > >>> > single operator, then he probably will assume that the
> > other
> > > > > > > operators
> > > > > > > >>> > will get the default share of resources and not nothing.
> > > > > > > >>> >
> > > > > > > >>> > Cheers,
> > > > > > > >>> > Till
> > > > > > > >>> >
> > > > > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > > > > > ches...@apache.org
> > > > > > > >>> > <mailto:ches...@apache.org>> wrote:
> > > > > > > >>> >
> > > > > > > >>> > Is there even a functional difference between specifying
> > the
> > > > > > > >>> > requirements for an SSG vs specifying the same
> requirements
> > > on
> > > > > > a
> > > > > > > >>> > single
> > > > > > > >>> > operator within that group (ideally a colocation group to
> > > avoid
> > > > > > > this
> > > > > > > >>> > whole hint business)?
> > > > > > > >>> >
> > > > > > > >>> > Wouldn't we get the best of both worlds in the latter
> case?
> > > > > > > >>> >
> > > > > > > >>> > Users can take shortcuts to define shared requirements,
> > > > > > > >>> > but refine them further as needed on a per-operator
> basis,
> > > > > > > >>> > without changing semantics of slotsharing groups
> > > > > > > >>> > nor the runtime being locked into SSG-based requirements.
> > > > > > > >>> >
> > > > > > > >>> > (And before anyone argues what happens if slotsharing
> > groups
> > > > > > > >>> > change or
> > > > > > > >>> > whatnot, that's a plain API issue that we could surely
> > solve.
> > > > > > (A
> > > > > > > >>> > plain
> > > > > > > >>> > iteration over slotsharing groups and therein contained
> > > > > > operators
> > > > > > > >>> > would
> > > > > > > >>> > suffice)).
> > > > > > > >>> >
> > > > > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > > > > > >>> > > Maybe a different minor idea: Would it be possible to
> > treat
> > > > > > > the SSG
> > > > > > > >>> > > resource requirements as a hint for the runtime similar
> > to
> > > > > > how
> > > > > > > >>> > slot sharing
> > > > > > > >>> > > groups are designed at the moment? Meaning that we
> don't
> > > give
> > > > > > > >>> > the guarantee
> > > > > > > >>> > > that Flink will always deploy this set of tasks
> together
> > no
> > > > > > > >>> > matter what
> > > > > > > >>> > > comes. If, for example, the runtime can derive by some
> > > means
> > > > > > > the
> > > > > > > >>> > resource
> > > > > > > >>> > > requirements for each task based on the requirements
> for
> > > the
> > > > > > > >>> > SSG, this
> > > > > > > >>> > > could be possible. One easy strategy would be to give
> > every
> > > > > > > task
> > > > > > > >>> > the same
> > > > > > > >>> > > resources as the whole slot sharing group. Another one
> > > could
> > > > > > be
> > > > > > > >>> > > distributing the resources equally among the tasks.
> This
> > > does
> > > > > > > >>> > not even have
> > > > > > > >>> > > to be implemented but we would give ourselves the
> freedom
> > > to
> > > > > > > change
> > > > > > > >>> > > scheduling if need should arise.
> > > > > > > >>> > >
> > > > > > > >>> > > Cheers,
> > > > > > > >>> > > Till
> > > > > > > >>> > >
> > > > > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > > > > > karma...@gmail.com
> > > > > > > >>> > <mailto:karma...@gmail.com>> wrote:
> > > > > > > >>> > >
> > > > > > > >>> > >> Thanks for the responses, Till and Xintong.
> > > > > > > >>> > >>
> > > > > > > >>> > >> I second Xintong's comment that SSG-based runtime
> > > interface
> > > > > > > >>> > will give
> > > > > > > >>> > >> us the flexibility to achieve op/task-based approach.
> > > That's
> > > > > > > one of
> > > > > > > >>> > >> the most important reasons for our design choice.
> > > > > > > >>> > >>
> > > > > > > >>> > >> Some cents regarding the default operator resource:
> > > > > > > >>> > >> - It might be good for the scenario of DataStream
> jobs.
> > > > > > > >>> > >> ** For light-weight operators, the accumulative
> > > > > > > >>> > configuration error
> > > > > > > >>> > >> will not be significant. Then, the resource of a task
> > used
> > > > > > is
> > > > > > > >>> > >> proportional to the number of operators it contains.
> > > > > > > >>> > >> ** For heavy operators like join and window or
> operators
> > > > > > > >>> > using the
> > > > > > > >>> > >> external resources, user will turn to the fine-grained
> > > > > > > resource
> > > > > > > >>> > >> configuration.
> > > > > > > >>> > >> - It can increase the stability for the standalone
> > cluster
> > > > > > > >>> > where task
> > > > > > > >>> > >> executors registered are heterogeneous(with different
> > > > > > default
> > > > > > > slot
> > > > > > > >>> > >> resources).
> > > > > > > >>> > >> - It might not be good for SQL users. The operators
> that
> > > SQL
> > > > > > > >>> > will be
> > > > > > > >>> > >> transferred to is a black box to the user. We also do
> > not
> > > > > > > guarantee
> > > > > > > >>> > >> the cross-version of consistency of the transformation
> > so
> > > > > > far.
> > > > > > > >>> > >>
> > > > > > > >>> > >> I think it can be treated as a follow-up work when the
> > > > > > > fine-grained
> > > > > > > >>> > >> resource management is end-to-end ready.
> > > > > > > >>> > >>
> > > > > > > >>> > >> Best,
> > > > > > > >>> > >> Yangze Guo
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > > > > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>>
> > > > > > > >>> > >> wrote:
> > > > > > > >>> > >>> Thanks for the feedback, Till.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> ## I feel that what you proposed (operator-based +
> > > default
> > > > > > > >>> > value) might
> > > > > > > >>> > >> be
> > > > > > > >>> > >>> subsumed by the SSG-based approach.
> > > > > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4
> > > cases,
> > > > > > > >>> > categorized by
> > > > > > > >>> > >>> whether the resource requirements are known to the
> > users.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > > > > > >>> > reason to put
> > > > > > > >>> > >>> multiple operators whose individual resource
> > > > > > requirements
> > > > > > > >>> > are already
> > > > > > > >>> > >> known
> > > > > > > >>> > >>> into the same group in fine-grained resource
> > > > > > management.
> > > > > > > >>> > And if op_1
> > > > > > > >>> > >> and
> > > > > > > >>> > >>> op_2 are in different groups, there should be no
> > > > > > problem
> > > > > > > >>> > switching
> > > > > > > >>> > >> data
> > > > > > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > > > > > >>> > equivalent to
> > > > > > > >>> > >> specifying
> > > > > > > >>> > >>> operator resource requirements in your proposal.
> > > > > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > > > > > that
> > > > > > > >>> > op_2 is in a
> > > > > > > >>> > >>> SSG whose resource is not specified thus would have
> the
> > > > > > > >>> > default slot
> > > > > > > >>> > >>> resource. This is equivalent to having default
> operator
> > > > > > > >>> > resources in
> > > > > > > >>> > >> your
> > > > > > > >>> > >>> proposal.
> > > > > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > > > > > op_2
> > > > > > > >>> > to the same
> > > > > > > >>> > >> SSG
> > > > > > > >>> > >>> or separate SSGs.
> > > > > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > > > > > >>> > equivalent to
> > > > > > > >>> > >> the
> > > > > > > >>> > >>> coarse-grained resource management, where op_1 and
> > > > > > > op_2
> > > > > > > >>> > share a
> > > > > > > >>> > >> default
> > > > > > > >>> > >>> size slot no matter which data exchange mode is
> > > > > > used.
> > > > > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > > > > > of
> > > > > > > >>> > them will
> > > > > > > >>> > >> use
> > > > > > > >>> > >>> a default size slot. This is equivalent to setting
> > > > > > > them
> > > > > > > >>> > with
> > > > > > > >>> > >> default
> > > > > > > >>> > >>> operator resources in your proposal.
> > > > > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and
> op_2
> > > > > > > is
> > > > > > > >>> > known.*
> > > > > > > >>> > >>> - It is possible that the user learns the total /
> > > > > > max
> > > > > > > >>> > resource
> > > > > > > >>> > >>> requirement from executing and monitoring the job,
> > > > > > > >>> > while not
> > > > > > > >>> > >>> being aware of
> > > > > > > >>> > >>> individual operator requirements.
> > > > > > > >>> > >>> - I believe this is the case your proposal does not
> > > > > > > >>> > cover. And TBH,
> > > > > > > >>> > >>> this is probably how most users learn the resource
> > > > > > > >>> > requirements,
> > > > > > > >>> > >>> according
> > > > > > > >>> > >>> to my experiences.
> > > > > > > >>> > >>> - In this case, the user might need to specify
> > > > > > > >>> > different resources
> > > > > > > >>> > >> if
> > > > > > > >>> > >>> he wants to switch the execution mode, which should
> > > > > > > not
> > > > > > > >>> > be worse
> > > > > > > >>> > >> than not
> > > > > > > >>> > >>> being able to use fine-grained resource management.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> ## An additional idea inspired by your proposal.
> > > > > > > >>> > >>> We may provide multiple options for deciding
> resources
> > > for
> > > > > > > >>> > SSGs whose
> > > > > > > >>> > >>> requirement is not specified, if needed.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> - Default slot resource (current design)
> > > > > > > >>> > >>> - Default operator resource times number of operators
> > > > > > > >>> > (equivalent to
> > > > > > > >>> > >>> your proposal)
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> ## Exposing internal runtime strategies
> > > > > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > > > > > >>> > requirements might be
> > > > > > > >>> > >>> affected if how SSGs are internally handled changes
> in
> > > > > > > future.
> > > > > > > >>> > >> Practically,
> > > > > > > >>> > >>> I do not concretely see at the moment what kind of
> > > changes
> > > > > > we
> > > > > > > >>> > may want in
> > > > > > > >>> > >>> future that might conflict with this FLIP proposal,
> as
> > > the
> > > > > > > >>> > question of
> > > > > > > >>> > >>> switching data exchange mode answered above. I'd
> > suggest
> > > to
> > > > > > > >>> > not give up
> > > > > > > >>> > >> the
> > > > > > > >>> > >>> user friendliness we may gain now for the future
> > problems
> > > > > > > that
> > > > > > > >>> > may or may
> > > > > > > >>> > >>> not exist.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> Moreover, the SSG-based approach has the flexibility
> to
> > > > > > > >>> > achieve the
> > > > > > > >>> > >>> equivalent behavior as the operator-based approach,
> if
> > we
> > > > > > > set each
> > > > > > > >>> > >> operator
> > > > > > > >>> > >>> (or task) to a separate SSG. We can even provide a
> > > shortcut
> > > > > > > >>> > option to
> > > > > > > >>> > >>> automatically do that for users, if needed.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> Thank you~
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> Xintong Song
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > > > > > >>> > <trohrm...@apache.org <mailto:trohrm...@apache.org>>
> > > > > > > >>> > >> wrote:
> > > > > > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>> I agree that being able to define the resource
> > > > > > requirements
> > > > > > > for a
> > > > > > > >>> > >> group of
> > > > > > > >>> > >>>> operators is more user friendly. However, my concern
> > is
> > > > > > that
> > > > > > > >>> > we are
> > > > > > > >>> > >>>> exposing thereby internal runtime strategies which
> > might
> > > > > > > >>> > limit our
> > > > > > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > > > > > semantics
> > > > > > > of
> > > > > > > >>> > >> configuring
> > > > > > > >>> > >>>> resource requirements for SSGs could break if
> > switching
> > > > > > from
> > > > > > > >>> > streaming
> > > > > > > >>> > >> to
> > > > > > > >>> > >>>> batch execution. If one defines the resource
> > > requirements
> > > > > > > for
> > > > > > > >>> > op_1 ->
> > > > > > > >>> > >> op_2
> > > > > > > >>> > >>>> which run in pipelined mode when using the streaming
> > > > > > > >>> > execution, then
> > > > > > > >>> > >> how do
> > > > > > > >>> > >>>> we interpret these requirements when op_1 -> op_2
> are
> > > > > > > >>> > executed with a
> > > > > > > >>> > >>>> blocking data exchange in batch execution mode?
> > > > > > > Consequently,
> > > > > > > >>> > I am
> > > > > > > >>> > >> still
> > > > > > > >>> > >>>> leaning towards Stephan's proposal to set the
> resource
> > > > > > > >>> > requirements per
> > > > > > > >>> > >>>> operator.
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>> Maybe the following proposal makes the configuration
> > > > > > easier:
> > > > > > > >>> > If the
> > > > > > > >>> > >> user
> > > > > > > >>> > >>>> wants to use fine-grained resource requirements,
> then
> > > she
> > > > > > > >>> > needs to
> > > > > > > >>> > >> specify
> > > > > > > >>> > >>>> the default size which is used for operators which
> > have
> > > no
> > > > > > > >>> > explicit
> > > > > > > >>> > >>>> resource annotation. If this holds true, then every
> > > > > > operator
> > > > > > > >>> > would
> > > > > > > >>> > >> have a
> > > > > > > >>> > >>>> resource requirement and the system can try to
> execute
> > > the
> > > > > > > >>> > operators
> > > > > > > >>> > >> in the
> > > > > > > >>> > >>>> best possible manner w/o being constrained by how
> the
> > > user
> > > > > > > >>> > set the SSG
> > > > > > > >>> > >>>> requirements.
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>> Cheers,
> > > > > > > >>> > >>>> Till
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > > > > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>>
> > > > > > > >>> > >>>> wrote:
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Actually, your proposal has also come to my mind at
> > > some
> > > > > > > >>> > point. And I
> > > > > > > >>> > >>>> have
> > > > > > > >>> > >>>>> some concerns about it.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> 1. It does not give users the same control as the
> > > > > > SSG-based
> > > > > > > >>> > approach.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> While both approaches do not require specifying for
> > > each
> > > > > > > >>> > operator,
> > > > > > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > > > > > operators
> > > > > > > >>> > >> together
> > > > > > > >>> > >>>> use
> > > > > > > >>> > >>>>> this much resource" while the operator-based
> approach
> > > > > > > doesn't.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Think of a long pipeline with m operators (o_1,
> o_2,
> > > ...,
> > > > > > > >>> > o_m), and
> > > > > > > >>> > >> at
> > > > > > > >>> > >>>> some
> > > > > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which
> > > significantly
> > > > > > > >>> > reduces the
> > > > > > > >>> > >> data
> > > > > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups
> > > SSG_1
> > > > > > > >>> > (o_1, ...,
> > > > > > > >>> > >> o_n)
> > > > > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring
> much
> > > > > > higher
> > > > > > > >>> > >> parallelisms
> > > > > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2
> > > won't
> > > > > > > >>> > lead to too
> > > > > > > >>> > >> much
> > > > > > > >>> > >>>>> wasting of resources. If the two SSGs end up
> needing
> > > > > > > different
> > > > > > > >>> > >> resources,
> > > > > > > >>> > >>>>> with the SSG-based approach one can directly
> specify
> > > > > > > >>> > resources for
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>> two
> > > > > > > >>> > >>>>> groups. However, with the operator-based approach,
> > the
> > > > > > > user will
> > > > > > > >>> > >> have to
> > > > > > > >>> > >>>>> specify resources for each operator in one of the
> two
> > > > > > > >>> > groups, and
> > > > > > > >>> > >> tune
> > > > > > > >>> > >>>> the
> > > > > > > >>> > >>>>> default slot resource via configurations to fit the
> > > other
> > > > > > > group.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> 2. It increases the chance of breaking operator
> > chains.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Setting chainnable operators into different slot
> > > sharing
> > > > > > > >>> > groups will
> > > > > > > >>> > >>>>> prevent them from being chained. In the current
> > > > > > > implementation,
> > > > > > > >>> > >>>> downstream
> > > > > > > >>> > >>>>> operators, if SSG not explicitly specified, will be
> > set
> > > > > > to
> > > > > > > >>> > the same
> > > > > > > >>> > >> group
> > > > > > > >>> > >>>>> as the chainable upstream operators (unless
> multiple
> > > > > > > upstream
> > > > > > > >>> > >> operators
> > > > > > > >>> > >>>> in
> > > > > > > >>> > >>>>> different groups), to reduce the chance of breaking
> > > > > > chains.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3
> ->
> > > o_3,
> > > > > > > >>> > deciding
> > > > > > > >>> > >> SSGs
> > > > > > > >>> > >>>>> based on whether resource is specified we will
> easily
> > > get
> > > > > > > >>> > groups like
> > > > > > > >>> > >>>> (o_1,
> > > > > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can
> be
> > > > > > > >>> > chained. This
> > > > > > > >>> > >> is
> > > > > > > >>> > >>>> also
> > > > > > > >>> > >>>>> possible for the SSG-based approach, but I believe
> > the
> > > > > > > >>> > chance is much
> > > > > > > >>> > >>>>> smaller because there's no strong reason for users
> to
> > > > > > > >>> > specify the
> > > > > > > >>> > >> groups
> > > > > > > >>> > >>>>> with alternate operators like that. We are more
> > likely
> > > to
> > > > > > > >>> > get groups
> > > > > > > >>> > >> like
> > > > > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks
> only
> > > > > > > between
> > > > > > > >>> > o_2 and
> > > > > > > >>> > >> o_3.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> 3. It complicates the system by having two
> different
> > > > > > > >>> > mechanisms for
> > > > > > > >>> > >>>> sharing
> > > > > > > >>> > >>>>> managed memory in a slot.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > > > > > memory
> > > > > > > >>> > sharing
> > > > > > > >>> > >>>>> mechanism, where managed memory is first
> distributed
> > > > > > > >>> > according to the
> > > > > > > >>> > >>>>> consumer type, then further distributed across
> > > operators
> > > > > > > of that
> > > > > > > >>> > >> consumer
> > > > > > > >>> > >>>>> type.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> - With the operator-based approach, managed memory
> > size
> > > > > > > >>> > specified
> > > > > > > >>> > >> for an
> > > > > > > >>> > >>>>> operator should account for all the consumer types
> of
> > > > > > that
> > > > > > > >>> > operator.
> > > > > > > >>> > >> That
> > > > > > > >>> > >>>>> means the managed memory is first distributed
> across
> > > > > > > >>> > operators, then
> > > > > > > >>> > >>>>> distributed to different consumer types of each
> > > operator.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Unfortunately, the different order of the two
> > > calculation
> > > > > > > >>> > steps can
> > > > > > > >>> > >> lead
> > > > > > > >>> > >>>> to
> > > > > > > >>> > >>>>> different results. To be specific, the semantic of
> > the
> > > > > > > >>> > configuration
> > > > > > > >>> > >>>> option
> > > > > > > >>> > >>>>> `consumer-weights` changed (within a slot vs.
> within
> > an
> > > > > > > >>> > operator).
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> To sum up things:
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> While (3) might be a bit more implementation
> related,
> > I
> > > > > > > >>> > think (1)
> > > > > > > >>> > >> and (2)
> > > > > > > >>> > >>>>> somehow suggest that, the price for the proposed
> > > approach
> > > > > > > to
> > > > > > > >>> > avoid
> > > > > > > >>> > >>>>> specifying resource for every operator is that it's
> > not
> > > > > > as
> > > > > > > >>> > >> independent
> > > > > > > >>> > >>>> from
> > > > > > > >>> > >>>>> operator chaining and slot sharing as the
> > > operator-based
> > > > > > > >>> > approach
> > > > > > > >>> > >>>> discussed
> > > > > > > >>> > >>>>> in the FLIP.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Thank you~
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Xintong Song
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > > > > > >>> > <se...@apache.org <mailto:se...@apache.org>>
> > > > > > > >>> > >> wrote:
> > > > > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> I want to say, first of all, that this is super
> well
> > > > > > > >>> > written. And
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > > > > > >>> > configuration to
> > > > > > > >>> > >>>> users
> > > > > > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > > > > > >>> > >>>>>> So good job here!
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> About how to let users specify the resource
> > profiles.
> > > > > > If I
> > > > > > > >>> > can sum
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>>> FLIP
> > > > > > > >>> > >>>>>> and previous discussion up in my own words, the
> > > problem
> > > > > > > is the
> > > > > > > >>> > >>>> following:
> > > > > > > >>> > >>>>>> Operator-level specification is the simplest and
> > > > > > cleanest
> > > > > > > >>> > approach,
> > > > > > > >>> > >>>>> because
> > > > > > > >>> > >>>>>>> it avoids mixing operator configuration
> (resource)
> > > and
> > > > > > > >>> > >> scheduling. No
> > > > > > > >>> > >>>>>>> matter what other parameters change (chaining,
> slot
> > > > > > > sharing,
> > > > > > > >>> > >>>> switching
> > > > > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource
> > > profiles
> > > > > > > >>> > stay the
> > > > > > > >>> > >>>> same.
> > > > > > > >>> > >>>>>>> But it would require that a user specifies
> > resources
> > > on
> > > > > > > all
> > > > > > > >>> > >>>> operators,
> > > > > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > > > > > suggests
> > > > > > > going
> > > > > > > >>> > >> with
> > > > > > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> I think both thoughts are important, so can we
> find
> > a
> > > > > > > solution
> > > > > > > >>> > >> where
> > > > > > > >>> > >>>> the
> > > > > > > >>> > >>>>>> Resource Profiles are specified on an Operator,
> but
> > we
> > > > > > > >>> > still avoid
> > > > > > > >>> > >> that
> > > > > > > >>> > >>>>> we
> > > > > > > >>> > >>>>>> need to specify a resource profile on every
> > operator?
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> What do you think about something like the
> > following:
> > > > > > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > > > > > level.
> > > > > > > >>> > >>>>>> - Not all operators need profiles
> > > > > > > >>> > >>>>>> - All Operators without a Resource Profile ended
> up
> > > > > > in
> > > > > > > the
> > > > > > > >>> > >> default
> > > > > > > >>> > >>>> slot
> > > > > > > >>> > >>>>>> sharing group with a default profile (will get a
> > > default
> > > > > > > slot).
> > > > > > > >>> > >>>>>> - All Operators with a Resource Profile will go
> into
> > > > > > > >>> > another slot
> > > > > > > >>> > >>>>> sharing
> > > > > > > >>> > >>>>>> group (the resource-specified-group).
> > > > > > > >>> > >>>>>> - Users can define different slot sharing groups
> for
> > > > > > > >>> > operators
> > > > > > > >>> > >> like
> > > > > > > >>> > >>>>> they
> > > > > > > >>> > >>>>>> do now, with the exception that you cannot mix
> > > operators
> > > > > > > >>> > that have
> > > > > > > >>> > >> a
> > > > > > > >>> > >>>>>> resource profile and operators that have no
> resource
> > > > > > > profile.
> > > > > > > >>> > >>>>>> - The default case where no operator has a
> resource
> > > > > > > >>> > profile is
> > > > > > > >>> > >> just a
> > > > > > > >>> > >>>>>> special case of this model
> > > > > > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > > > > > operator,
> > > > > > > >>> > like it
> > > > > > > >>> > >> does
> > > > > > > >>> > >>>>> now,
> > > > > > > >>> > >>>>>> and the scheduler sums up the profiles of the
> tasks
> > > that
> > > > > > > it
> > > > > > > >>> > >> schedules
> > > > > > > >>> > >>>>>> together.
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> There is another question about reactive scaling
> > > raised
> > > > > > > in the
> > > > > > > >>> > >> FLIP. I
> > > > > > > >>> > >>>>> need
> > > > > > > >>> > >>>>>> to think a bit about that. That is indeed a bit
> more
> > > > > > > tricky
> > > > > > > >>> > once we
> > > > > > > >>> > >>>> have
> > > > > > > >>> > >>>>>> slots of different sizes.
> > > > > > > >>> > >>>>>> It is not clear then which of the different slot
> > > > > > requests
> > > > > > > the
> > > > > > > >>> > >>>>>> ResourceManager should fulfill when new resources
> > > (TMs)
> > > > > > > >>> > show up,
> > > > > > > >>> > >> or how
> > > > > > > >>> > >>>>> the
> > > > > > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > > > > > resources
> > > > > > > >>> > (TMs)
> > > > > > > >>> > >>>>> disappear
> > > > > > > >>> > >>>>>> This question is pretty orthogonal, though, to the
> > > "how
> > > > > > to
> > > > > > > >>> > specify
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>>>> resources".
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> Best,
> > > > > > > >>> > >>>>>> Stephan
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > > > > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>
> > > > > > > >>> > >>>>> wrote:
> > > > > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > > > > > discussion,
> > > > > > > >>> > Yangze.
> > > > > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> @Till,
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> I agree that specifying requirements for SSGs
> means
> > > > > > that
> > > > > > > SSGs
> > > > > > > >>> > >> need to
> > > > > > > >>> > >>>>> be
> > > > > > > >>> > >>>>>>> supported in fine-grained resource management,
> > > > > > otherwise
> > > > > > > each
> > > > > > > >>> > >>>> operator
> > > > > > > >>> > >>>>>>> might use as many resources as the whole group.
> > > > > > However,
> > > > > > > I
> > > > > > > >>> > cannot
> > > > > > > >>> > >>>> think
> > > > > > > >>> > >>>>>> of
> > > > > > > >>> > >>>>>>> a strong reason for not supporting SSGs in
> > > fine-grained
> > > > > > > >>> > resource
> > > > > > > >>> > >>>>>>> management.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>>> Interestingly, if all operators have their
> > resources
> > > > > > > properly
> > > > > > > >>> > >>>>>> specified,
> > > > > > > >>> > >>>>>>>> then slot sharing is no longer needed because
> > Flink
> > > > > > > could
> > > > > > > >>> > >> slice off
> > > > > > > >>> > >>>>> the
> > > > > > > >>> > >>>>>>>> appropriately sized slots for every Task
> > > individually.
> > > > > > > >>> > >>>>>>>>
> > > > > > > >>> > >>>>>>> So for example, if we have a job consisting of
> two
> > > > > > > >>> > operator op_1
> > > > > > > >>> > >> and
> > > > > > > >>> > >>>>> op_2
> > > > > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would
> > then
> > > > > > say
> > > > > > > that
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>> slot
> > > > > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If
> we
> > > > > > have
> > > > > > > a
> > > > > > > >>> > >> cluster
> > > > > > > >>> > >>>>> with
> > > > > > > >>> > >>>>>> 2
> > > > > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the
> system
> > > > > > > cannot run
> > > > > > > >>> > >> this
> > > > > > > >>> > >>>>> job.
> > > > > > > >>> > >>>>>> If
> > > > > > > >>> > >>>>>>>> the resources were specified on an operator
> level,
> > > > > > then
> > > > > > > the
> > > > > > > >>> > >> system
> > > > > > > >>> > >>>>>> could
> > > > > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1
> and
> > > > > > op_2
> > > > > > > to
> > > > > > > >>> > >> TM_2.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> Couldn't agree more that if all operators'
> > > requirements
> > > > > > > are
> > > > > > > >>> > >> properly
> > > > > > > >>> > >>>>>>> specified, slot sharing should be no longer
> needed.
> > I
> > > > > > > >>> > think this
> > > > > > > >>> > >>>>> exactly
> > > > > > > >>> > >>>>>>> disproves the example. If we already know op_1
> and
> > > op_2
> > > > > > > each
> > > > > > > >>> > >> needs
> > > > > > > >>> > >>>> 100
> > > > > > > >>> > >>>>> MB
> > > > > > > >>> > >>>>>>> of memory, why would we put them in the same
> group?
> > > If
> > > > > > > >>> > they are
> > > > > > > >>> > >> in
> > > > > > > >>> > >>>>>> separate
> > > > > > > >>> > >>>>>>> groups, with the proposed approach the system can
> > > > > > freely
> > > > > > > >>> > deploy
> > > > > > > >>> > >> them
> > > > > > > >>> > >>>> to
> > > > > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> Moreover, the precondition for not needing slot
> > > sharing
> > > > > > > is
> > > > > > > >>> > having
> > > > > > > >>> > >>>>>> resource
> > > > > > > >>> > >>>>>>> requirements properly specified for all
> operators.
> > > This
> > > > > > > is not
> > > > > > > >>> > >> always
> > > > > > > >>> > >>>>>>> possible, and usually requires tremendous
> efforts.
> > > One
> > > > > > > of the
> > > > > > > >>> > >>>> benefits
> > > > > > > >>> > >>>>>> for
> > > > > > > >>> > >>>>>>> SSG-based requirements is that it allows the user
> > to
> > > > > > > freely
> > > > > > > >>> > >> decide
> > > > > > > >>> > >>>> the
> > > > > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I
> would
> > > > > > > >>> > consider SSG
> > > > > > > >>> > >> in
> > > > > > > >>> > >>>>>>> fine-grained resource management as a group of
> > > > > > operators
> > > > > > > >>> > that the
> > > > > > > >>> > >>>> user
> > > > > > > >>> > >>>>>>> would like to specify the total resource for.
> There
> > > can
> > > > > > > be
> > > > > > > >>> > only
> > > > > > > >>> > >> one
> > > > > > > >>> > >>>>> group
> > > > > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a
> few
> > > > > > major
> > > > > > > >>> > parts,
> > > > > > > >>> > >> or as
> > > > > > > >>> > >>>>>> many
> > > > > > > >>> > >>>>>>> groups as the number of tasks/operators,
> depending
> > on
> > > > > > how
> > > > > > > >>> > >>>> fine-grained
> > > > > > > >>> > >>>>>> the
> > > > > > > >>> > >>>>>>> user is able to specify the resources.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But
> > > given
> > > > > > > >>> > that all
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>>>>> current scheduler implementations already support
> > > > > > SSGs, I
> > > > > > > >>> > tend to
> > > > > > > >>> > >>>> think
> > > > > > > >>> > >>>>>>> that as an acceptable price for the above
> discussed
> > > > > > > >>> > usability and
> > > > > > > >>> > >>>>>>> flexibility.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> @Chesnay
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> Will declaring them on slot sharing groups not
> also
> > > > > > waste
> > > > > > > >>> > >> resources
> > > > > > > >>> > >>>> if
> > > > > > > >>> > >>>>>> the
> > > > > > > >>> > >>>>>>>> parallelism of operators within that group are
> > > > > > > different?
> > > > > > > >>> > >>>>>>>>
> > > > > > > >>> > >>>>>>> Yes. It's a trade-off between usability and
> > resource
> > > > > > > >>> > >> utilization. To
> > > > > > > >>> > >>>>>> avoid
> > > > > > > >>> > >>>>>>> such wasting, the user can define more groups, so
> > > that
> > > > > > > >>> > each group
> > > > > > > >>> > >>>>>> contains
> > > > > > > >>> > >>>>>>> less operators and the chance of having operators
> > > with
> > > > > > > >>> > different
> > > > > > > >>> > >>>>>>> parallelism will be reduced. The price is to have
> > > more
> > > > > > > >>> > resource
> > > > > > > >>> > >>>>>>> requirements to specify.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> It also seems like quite a hassle for users
> having
> > to
> > > > > > > >>> > >> recalculate the
> > > > > > > >>> > >>>>>>>> resource requirements if they change the slot
> > > sharing.
> > > > > > > >>> > >>>>>>>> I'd think that it's not really workable for
> users
> > > that
> > > > > > > create
> > > > > > > >>> > >> a set
> > > > > > > >>> > >>>>> of
> > > > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched
> in
> > > > > > their
> > > > > > > >>> > >>>>> applications;
> > > > > > > >>> > >>>>>>>> managing the resources requirements in such a
> > > setting
> > > > > > > >>> > would be
> > > > > > > >>> > >> a
> > > > > > > >>> > >>>>>>>> nightmare, and in the end would require
> > > operator-level
> > > > > > > >>> > >> requirements
> > > > > > > >>> > >>>>> any
> > > > > > > >>> > >>>>>>>> way.
> > > > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it
> really
> > > > > > > increases
> > > > > > > >>> > >>>>> usability.
> > > > > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > > > > > there's no
> > > > > > > >>> > >> reason to
> > > > > > > >>> > >>>>> put
> > > > > > > >>> > >>>>>>> multiple operators whose individual resource
> > > > > > > >>> > requirements are
> > > > > > > >>> > >>>>> already
> > > > > > > >>> > >>>>>>> known
> > > > > > > >>> > >>>>>>> into the same group in fine-grained resource
> > > > > > > management.
> > > > > > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > > > > > multiple
> > > > > > > >>> > >>>>> applications,
> > > > > > > >>> > >>>>>>> it does not guarantee the same resource
> > > > > > requirements.
> > > > > > > >>> > During
> > > > > > > >>> > >> our
> > > > > > > >>> > >>>>> years
> > > > > > > >>> > >>>>>>> of
> > > > > > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > > > > > requirements
> > > > > > > >>> > >> specified for
> > > > > > > >>> > >>>>>>> Blink's
> > > > > > > >>> > >>>>>>> fine-grained resource management, very few users
> > > > > > > >>> > (including
> > > > > > > >>> > >> our
> > > > > > > >>> > >>>>>>> specialists
> > > > > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are
> as
> > > > > > > >>> > >> experienced as
> > > > > > > >>> > >>>>> to
> > > > > > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > > > > > >>> > >> requirements.
> > > > > > > >>> > >>>> Most
> > > > > > > >>> > >>>>>>> people
> > > > > > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > > > > > delay, cpu
> > > > > > > >>> > >> load,
> > > > > > > >>> > >>>>>> memory
> > > > > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > > > > > specification.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> To sum up:
> > > > > > > >>> > >>>>>>> If the user is capable of providing proper
> resource
> > > > > > > >>> > requirements
> > > > > > > >>> > >> for
> > > > > > > >>> > >>>>>> every
> > > > > > > >>> > >>>>>>> operator, that's definitely a good thing and we
> > would
> > > > > > not
> > > > > > > >>> > need to
> > > > > > > >>> > >>>> rely
> > > > > > > >>> > >>>>> on
> > > > > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for
> > the
> > > > > > > >>> > >> fine-grained
> > > > > > > >>> > >>>>>> resource
> > > > > > > >>> > >>>>>>> management to work. For those users who are
> capable
> > > and
> > > > > > > do not
> > > > > > > >>> > >> like
> > > > > > > >>> > >>>>>> having
> > > > > > > >>> > >>>>>>> to set each operator to a separate SSG, I would
> be
> > ok
> > > > > > to
> > > > > > > have
> > > > > > > >>> > >> both
> > > > > > > >>> > >>>>>>> SSG-based and operator-based runtime

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Reply via email to