Thanks for the summary, Yangze.

The changes and follow-up issues LGTM. Let's wait for responses from the
others before starting a vote.

Thank you~

Xintong Song



On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <karma...@gmail.com> wrote:

> Thanks everyone for the lively discussion. I'd like to try to
> summarize the current convergence in the discussion. Please let me
> know if I got things wrong or missed something crucial here.
>
> Change of this FLIP:
> - Treat the SSG resource requirements as a hint instead of a
> restriction for the runtime. That's should be explicitly explained in
> the JavaDocs.
>
> Potential follow-up issues if needed:
> - Provide operator-level resource configuration interface.
> - Provide multiple options for deciding resources for SSGs whose
> requirement is not specified:
>     ** Default slot resource.
>     ** Default operator resource times number of operators.
>
> If there are no other issues, I'll update the FLIP accordingly and
> start a vote thread. Thanks all for the valuable feedback again.
>
> Best,
> Yangze Guo
>
> Best,
> Yangze Guo
>
>
> On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <tonysong...@gmail.com>
> wrote:
> >
> >
> >  FGRuntimeInterface.png
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <tonysong...@gmail.com>
> wrote:
> >>
> >> I think Chesnay's proposal could actually work. IIUC, the keypoint is
> to derive operator requirements from SSG requirements on the API side, so
> that the runtime only deals with operator requirements. It's debatable how
> the deriving should be done though. E.g., an alternative could be to evenly
> divide the SSG requirement into requirements of operators in the group.
> >>
> >>
> >> However, I'm not entirely sure which option is more desired.
> Illustrating my understanding in the following figure, in which on the top
> is Chesnay's proposal and on the bottom is the SSG-based proposal in this
> FLIP.
> >>
> >>
> >>
> >> I think the major difference between the two approaches is where
> deriving operator requirements from SSG requirements happens.
> >>
> >> - Chesnay's proposal simplifies the runtime logic and the interface to
> expose, at the price of moving more complexity (i.e. the deriving) to the
> API side. The question is, where do we prefer to keep the complexity? I'm
> slightly leaning towards having a thin API and keep the complexity in
> runtime if possible.
> >>
> >> - Notice that the dash line arrows represent optional steps that are
> needed only for schedulers that do not respect SSGs, which we don't have at
> the moment. If we only look at the solid line arrows, then the SSG-based
> approach is much simpler, without needing to derive and aggregate the
> requirements back and forth. I'm not sure about complicating the current
> design only for the potential future needs.
> >>
> >>
> >> Thank you~
> >>
> >> Xintong Song
> >>
> >>
> >>
> >>
> >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <ches...@apache.org>
> wrote:
> >>>
> >>> You're raising a good point, but I think I can rectify that with a
> minor
> >>> adjustment.
> >>>
> >>> Default requirements are whatever the default requirements are, setting
> >>> the requirements for one operator has no effect on other operators.
> >>>
> >>> With these rules, and some API enhancements, the following mockup would
> >>> replicate the SSG-based behavior:
> >>>
> >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> >>> for slotSharingGroup in env.getSlotSharingGroups() {
> >>>      vertices = slotSharingGroup.getVertices()
> >>>
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> >>> vertices.remainint().setRequirements(ZERO)
> >>> }
> >>>
> >>> We could even allow setting requirements on slotsharing-groups
> >>> colocation-groups and internally translate them accordingly.
> >>> I can't help but feel this is a plain API issue.
> >>>
> >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> >>> > If I understand you correctly Chesnay, then you want to decouple the
> >>> > resource requirement specification from the slot sharing group
> >>> > assignment. Hence, per default all operators would be in the same
> slot
> >>> > sharing group. If there is no operator with a resource specification,
> >>> > then the system would allocate a default slot for it. If there is at
> >>> > least one operator, then the system would sum up all the specified
> >>> > resources and allocate a slot of this size. This effectively means
> >>> > that all unspecified operators will implicitly have a zero resource
> >>> > requirement. Did I understand your idea correctly?
> >>> >
> >>> > I am wondering whether this wouldn't lead to a surprising behaviour
> >>> > for the user. If the user specifies the resource requirements for a
> >>> > single operator, then he probably will assume that the other
> operators
> >>> > will get the default share of resources and not nothing.
> >>> >
> >>> > Cheers,
> >>> > Till
> >>> >
> >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <ches...@apache.org
> >>> > <mailto:ches...@apache.org>> wrote:
> >>> >
> >>> >     Is there even a functional difference between specifying the
> >>> >     requirements for an SSG vs specifying the same requirements on a
> >>> >     single
> >>> >     operator within that group (ideally a colocation group to avoid
> this
> >>> >     whole hint business)?
> >>> >
> >>> >     Wouldn't we get the best of both worlds in the latter case?
> >>> >
> >>> >     Users can take shortcuts to define shared requirements,
> >>> >     but refine them further as needed on a per-operator basis,
> >>> >     without changing semantics of slotsharing groups
> >>> >     nor the runtime being locked into SSG-based requirements.
> >>> >
> >>> >     (And before anyone argues what happens if slotsharing groups
> >>> >     change or
> >>> >     whatnot, that's a plain API issue that we could surely solve. (A
> >>> >     plain
> >>> >     iteration over slotsharing groups and therein contained operators
> >>> >     would
> >>> >     suffice)).
> >>> >
> >>> >     On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> >>> >     > Maybe a different minor idea: Would it be possible to treat
> the SSG
> >>> >     > resource requirements as a hint for the runtime similar to how
> >>> >     slot sharing
> >>> >     > groups are designed at the moment? Meaning that we don't give
> >>> >     the guarantee
> >>> >     > that Flink will always deploy this set of tasks together no
> >>> >     matter what
> >>> >     > comes. If, for example, the runtime can derive by some means
> the
> >>> >     resource
> >>> >     > requirements for each task based on the requirements for the
> >>> >     SSG, this
> >>> >     > could be possible. One easy strategy would be to give every
> task
> >>> >     the same
> >>> >     > resources as the whole slot sharing group. Another one could be
> >>> >     > distributing the resources equally among the tasks. This does
> >>> >     not even have
> >>> >     > to be implemented but we would give ourselves the freedom to
> change
> >>> >     > scheduling if need should arise.
> >>> >     >
> >>> >     > Cheers,
> >>> >     > Till
> >>> >     >
> >>> >     > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <karma...@gmail.com
> >>> >     <mailto:karma...@gmail.com>> wrote:
> >>> >     >
> >>> >     >> Thanks for the responses, Till and Xintong.
> >>> >     >>
> >>> >     >> I second Xintong's comment that SSG-based runtime interface
> >>> >     will give
> >>> >     >> us the flexibility to achieve op/task-based approach. That's
> one of
> >>> >     >> the most important reasons for our design choice.
> >>> >     >>
> >>> >     >> Some cents regarding the default operator resource:
> >>> >     >> - It might be good for the scenario of DataStream jobs.
> >>> >     >>     ** For light-weight operators, the accumulative
> >>> >     configuration error
> >>> >     >> will not be significant. Then, the resource of a task used is
> >>> >     >> proportional to the number of operators it contains.
> >>> >     >>     ** For heavy operators like join and window or operators
> >>> >     using the
> >>> >     >> external resources, user will turn to the fine-grained
> resource
> >>> >     >> configuration.
> >>> >     >> - It can increase the stability for the standalone cluster
> >>> >     where task
> >>> >     >> executors registered are heterogeneous(with different default
> slot
> >>> >     >> resources).
> >>> >     >> - It might not be good for SQL users. The operators that SQL
> >>> >     will be
> >>> >     >> transferred to is a black box to the user. We also do not
> guarantee
> >>> >     >> the cross-version of consistency of the transformation so far.
> >>> >     >>
> >>> >     >> I think it can be treated as a follow-up work when the
> fine-grained
> >>> >     >> resource management is end-to-end ready.
> >>> >     >>
> >>> >     >> Best,
> >>> >     >> Yangze Guo
> >>> >     >>
> >>> >     >>
> >>> >     >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> >>> >     <tonysong...@gmail.com <mailto:tonysong...@gmail.com>>
> >>> >     >> wrote:
> >>> >     >>> Thanks for the feedback, Till.
> >>> >     >>>
> >>> >     >>> ## I feel that what you proposed (operator-based + default
> >>> >     value) might
> >>> >     >> be
> >>> >     >>> subsumed by the SSG-based approach.
> >>> >     >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> >>> >     categorized by
> >>> >     >>> whether the resource requirements are known to the users.
> >>> >     >>>
> >>> >     >>>     1. *Both known.* As previously mentioned, there's no
> >>> >     reason to put
> >>> >     >>>     multiple operators whose individual resource requirements
> >>> >     are already
> >>> >     >> known
> >>> >     >>>     into the same group in fine-grained resource management.
> >>> >     And if op_1
> >>> >     >> and
> >>> >     >>>     op_2 are in different groups, there should be no problem
> >>> >     switching
> >>> >     >> data
> >>> >     >>>     exchange mode from pipelined to blocking. This is
> >>> >     equivalent to
> >>> >     >> specifying
> >>> >     >>>     operator resource requirements in your proposal.
> >>> >     >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that
> >>> >     op_2 is in a
> >>> >     >>>     SSG whose resource is not specified thus would have the
> >>> >     default slot
> >>> >     >>>     resource. This is equivalent to having default operator
> >>> >     resources in
> >>> >     >> your
> >>> >     >>>     proposal.
> >>> >     >>>     3. *Both unknown*. The user can either set op_1 and op_2
> >>> >     to the same
> >>> >     >> SSG
> >>> >     >>>     or separate SSGs.
> >>> >     >>>        - If op_1 and op_2 are in the same SSG, it will be
> >>> >     equivalent to
> >>> >     >> the
> >>> >     >>>        coarse-grained resource management, where op_1 and
> op_2
> >>> >     share a
> >>> >     >> default
> >>> >     >>>        size slot no matter which data exchange mode is used.
> >>> >     >>>        - If op_1 and op_2 are in different SSGs, then each of
> >>> >     them will
> >>> >     >> use
> >>> >     >>>        a default size slot. This is equivalent to setting
> them
> >>> >     with
> >>> >     >> default
> >>> >     >>>        operator resources in your proposal.
> >>> >     >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2
> is
> >>> >     known.*
> >>> >     >>>        - It is possible that the user learns the total / max
> >>> >     resource
> >>> >     >>>        requirement from executing and monitoring the job,
> >>> >     while not
> >>> >     >>> being aware of
> >>> >     >>>        individual operator requirements.
> >>> >     >>>        - I believe this is the case your proposal does not
> >>> >     cover. And TBH,
> >>> >     >>>        this is probably how most users learn the resource
> >>> >     requirements,
> >>> >     >>> according
> >>> >     >>>        to my experiences.
> >>> >     >>>        - In this case, the user might need to specify
> >>> >     different resources
> >>> >     >> if
> >>> >     >>>        he wants to switch the execution mode, which should
> not
> >>> >     be worse
> >>> >     >> than not
> >>> >     >>>        being able to use fine-grained resource management.
> >>> >     >>>
> >>> >     >>>
> >>> >     >>> ## An additional idea inspired by your proposal.
> >>> >     >>> We may provide multiple options for deciding resources for
> >>> >     SSGs whose
> >>> >     >>> requirement is not specified, if needed.
> >>> >     >>>
> >>> >     >>>     - Default slot resource (current design)
> >>> >     >>>     - Default operator resource times number of operators
> >>> >     (equivalent to
> >>> >     >>>     your proposal)
> >>> >     >>>
> >>> >     >>>
> >>> >     >>> ## Exposing internal runtime strategies
> >>> >     >>> Theoretically, yes. Tying to the SSGs, the resource
> >>> >     requirements might be
> >>> >     >>> affected if how SSGs are internally handled changes in
> future.
> >>> >     >> Practically,
> >>> >     >>> I do not concretely see at the moment what kind of changes we
> >>> >     may want in
> >>> >     >>> future that might conflict with this FLIP proposal, as the
> >>> >     question of
> >>> >     >>> switching data exchange mode answered above. I'd suggest to
> >>> >     not give up
> >>> >     >> the
> >>> >     >>> user friendliness we may gain now for the future problems
> that
> >>> >     may or may
> >>> >     >>> not exist.
> >>> >     >>>
> >>> >     >>> Moreover, the SSG-based approach has the flexibility to
> >>> >     achieve the
> >>> >     >>> equivalent behavior as the operator-based approach, if we
> set each
> >>> >     >> operator
> >>> >     >>> (or task) to a separate SSG. We can even provide a shortcut
> >>> >     option to
> >>> >     >>> automatically do that for users, if needed.
> >>> >     >>>
> >>> >     >>>
> >>> >     >>> Thank you~
> >>> >     >>>
> >>> >     >>> Xintong Song
> >>> >     >>>
> >>> >     >>>
> >>> >     >>>
> >>> >     >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> >>> >     <trohrm...@apache.org <mailto:trohrm...@apache.org>>
> >>> >     >> wrote:
> >>> >     >>>> Thanks for the responses Xintong and Stephan,
> >>> >     >>>>
> >>> >     >>>> I agree that being able to define the resource requirements
> for a
> >>> >     >> group of
> >>> >     >>>> operators is more user friendly. However, my concern is that
> >>> >     we are
> >>> >     >>>> exposing thereby internal runtime strategies which might
> >>> >     limit our
> >>> >     >>>> flexibility to execute a given job. Moreover, the semantics
> of
> >>> >     >> configuring
> >>> >     >>>> resource requirements for SSGs could break if switching from
> >>> >     streaming
> >>> >     >> to
> >>> >     >>>> batch execution. If one defines the resource requirements
> for
> >>> >     op_1 ->
> >>> >     >> op_2
> >>> >     >>>> which run in pipelined mode when using the streaming
> >>> >     execution, then
> >>> >     >> how do
> >>> >     >>>> we interpret these requirements when op_1 -> op_2 are
> >>> >     executed with a
> >>> >     >>>> blocking data exchange in batch execution mode?
> Consequently,
> >>> >     I am
> >>> >     >> still
> >>> >     >>>> leaning towards Stephan's proposal to set the resource
> >>> >     requirements per
> >>> >     >>>> operator.
> >>> >     >>>>
> >>> >     >>>> Maybe the following proposal makes the configuration easier:
> >>> >     If the
> >>> >     >> user
> >>> >     >>>> wants to use fine-grained resource requirements, then she
> >>> >     needs to
> >>> >     >> specify
> >>> >     >>>> the default size which is used for operators which have no
> >>> >     explicit
> >>> >     >>>> resource annotation. If this holds true, then every operator
> >>> >     would
> >>> >     >> have a
> >>> >     >>>> resource requirement and the system can try to execute the
> >>> >     operators
> >>> >     >> in the
> >>> >     >>>> best possible manner w/o being constrained by how the user
> >>> >     set the SSG
> >>> >     >>>> requirements.
> >>> >     >>>>
> >>> >     >>>> Cheers,
> >>> >     >>>> Till
> >>> >     >>>>
> >>> >     >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> >>> >     <tonysong...@gmail.com <mailto:tonysong...@gmail.com>>
> >>> >     >>>> wrote:
> >>> >     >>>>
> >>> >     >>>>> Thanks for the feedback, Stephan.
> >>> >     >>>>>
> >>> >     >>>>> Actually, your proposal has also come to my mind at some
> >>> >     point. And I
> >>> >     >>>> have
> >>> >     >>>>> some concerns about it.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> 1. It does not give users the same control as the SSG-based
> >>> >     approach.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> While both approaches do not require specifying for each
> >>> >     operator,
> >>> >     >>>>> SSG-based approach supports the semantic that "some
> operators
> >>> >     >> together
> >>> >     >>>> use
> >>> >     >>>>> this much resource" while the operator-based approach
> doesn't.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
> >>> >     o_m), and
> >>> >     >> at
> >>> >     >>>> some
> >>> >     >>>>> point there's an agg o_n (1 < n < m) which significantly
> >>> >     reduces the
> >>> >     >> data
> >>> >     >>>>> amount. One can separate the pipeline into 2 groups SSG_1
> >>> >     (o_1, ...,
> >>> >     >> o_n)
> >>> >     >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> >>> >     >> parallelisms
> >>> >     >>>>> for operators in SSG_1 than for operators in SSG_2 won't
> >>> >     lead to too
> >>> >     >> much
> >>> >     >>>>> wasting of resources. If the two SSGs end up needing
> different
> >>> >     >> resources,
> >>> >     >>>>> with the SSG-based approach one can directly specify
> >>> >     resources for
> >>> >     >> the
> >>> >     >>>> two
> >>> >     >>>>> groups. However, with the operator-based approach, the
> user will
> >>> >     >> have to
> >>> >     >>>>> specify resources for each operator in one of the two
> >>> >     groups, and
> >>> >     >> tune
> >>> >     >>>> the
> >>> >     >>>>> default slot resource via configurations to fit the other
> group.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> 2. It increases the chance of breaking operator chains.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Setting chainnable operators into different slot sharing
> >>> >     groups will
> >>> >     >>>>> prevent them from being chained. In the current
> implementation,
> >>> >     >>>> downstream
> >>> >     >>>>> operators, if SSG not explicitly specified, will be set to
> >>> >     the same
> >>> >     >> group
> >>> >     >>>>> as the chainable upstream operators (unless multiple
> upstream
> >>> >     >> operators
> >>> >     >>>> in
> >>> >     >>>>> different groups), to reduce the chance of breaking chains.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
> >>> >     deciding
> >>> >     >> SSGs
> >>> >     >>>>> based on whether resource is specified we will easily get
> >>> >     groups like
> >>> >     >>>> (o_1,
> >>> >     >>>>> o_3) & (o_2, o_4), where none of the operators can be
> >>> >     chained. This
> >>> >     >> is
> >>> >     >>>> also
> >>> >     >>>>> possible for the SSG-based approach, but I believe the
> >>> >     chance is much
> >>> >     >>>>> smaller because there's no strong reason for users to
> >>> >     specify the
> >>> >     >> groups
> >>> >     >>>>> with alternate operators like that. We are more likely to
> >>> >     get groups
> >>> >     >> like
> >>> >     >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> between
> >>> >     o_2 and
> >>> >     >> o_3.
> >>> >     >>>>>
> >>> >     >>>>> 3. It complicates the system by having two different
> >>> >     mechanisms for
> >>> >     >>>> sharing
> >>> >     >>>>> managed memory in  a slot.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> - In FLIP-141, we introduced the intra-slot managed memory
> >>> >     sharing
> >>> >     >>>>> mechanism, where managed memory is first distributed
> >>> >     according to the
> >>> >     >>>>> consumer type, then further distributed across operators
> of that
> >>> >     >> consumer
> >>> >     >>>>> type.
> >>> >     >>>>>
> >>> >     >>>>> - With the operator-based approach, managed memory size
> >>> >     specified
> >>> >     >> for an
> >>> >     >>>>> operator should account for all the consumer types of that
> >>> >     operator.
> >>> >     >> That
> >>> >     >>>>> means the managed memory is first distributed across
> >>> >     operators, then
> >>> >     >>>>> distributed to different consumer types of each operator.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Unfortunately, the different order of the two calculation
> >>> >     steps can
> >>> >     >> lead
> >>> >     >>>> to
> >>> >     >>>>> different results. To be specific, the semantic of the
> >>> >     configuration
> >>> >     >>>> option
> >>> >     >>>>> `consumer-weights` changed (within a slot vs. within an
> >>> >     operator).
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> To sum up things:
> >>> >     >>>>>
> >>> >     >>>>> While (3) might be a bit more implementation related, I
> >>> >     think (1)
> >>> >     >> and (2)
> >>> >     >>>>> somehow suggest that, the price for the proposed approach
> to
> >>> >     avoid
> >>> >     >>>>> specifying resource for every operator is that it's not as
> >>> >     >> independent
> >>> >     >>>> from
> >>> >     >>>>> operator chaining and slot sharing as the operator-based
> >>> >     approach
> >>> >     >>>> discussed
> >>> >     >>>>> in the FLIP.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Thank you~
> >>> >     >>>>>
> >>> >     >>>>> Xintong Song
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> >>> >     <se...@apache.org <mailto:se...@apache.org>>
> >>> >     >> wrote:
> >>> >     >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> >>> >     >>>>>>
> >>> >     >>>>>> I want to say, first of all, that this is super well
> >>> >     written. And
> >>> >     >> the
> >>> >     >>>>>> points that the FLIP makes about how to expose the
> >>> >     configuration to
> >>> >     >>>> users
> >>> >     >>>>>> is exactly the right thing to figure out first.
> >>> >     >>>>>> So good job here!
> >>> >     >>>>>>
> >>> >     >>>>>> About how to let users specify the resource profiles. If I
> >>> >     can sum
> >>> >     >> the
> >>> >     >>>>> FLIP
> >>> >     >>>>>> and previous discussion up in my own words, the problem
> is the
> >>> >     >>>> following:
> >>> >     >>>>>> Operator-level specification is the simplest and cleanest
> >>> >     approach,
> >>> >     >>>>> because
> >>> >     >>>>>>> it avoids mixing operator configuration (resource) and
> >>> >     >> scheduling. No
> >>> >     >>>>>>> matter what other parameters change (chaining, slot
> sharing,
> >>> >     >>>> switching
> >>> >     >>>>>>> pipelined and blocking shuffles), the resource profiles
> >>> >     stay the
> >>> >     >>>> same.
> >>> >     >>>>>>> But it would require that a user specifies resources on
> all
> >>> >     >>>> operators,
> >>> >     >>>>>>> which makes it hard to use. That's why the FLIP suggests
> going
> >>> >     >> with
> >>> >     >>>>>>> specifying resources on a Sharing-Group.
> >>> >     >>>>>>
> >>> >     >>>>>> I think both thoughts are important, so can we find a
> solution
> >>> >     >> where
> >>> >     >>>> the
> >>> >     >>>>>> Resource Profiles are specified on an Operator, but we
> >>> >     still avoid
> >>> >     >> that
> >>> >     >>>>> we
> >>> >     >>>>>> need to specify a resource profile on every operator?
> >>> >     >>>>>>
> >>> >     >>>>>> What do you think about something like the following:
> >>> >     >>>>>>    - Resource Profiles are specified on an operator level.
> >>> >     >>>>>>    - Not all operators need profiles
> >>> >     >>>>>>    - All Operators without a Resource Profile ended up in
> the
> >>> >     >> default
> >>> >     >>>> slot
> >>> >     >>>>>> sharing group with a default profile (will get a default
> slot).
> >>> >     >>>>>>    - All Operators with a Resource Profile will go into
> >>> >     another slot
> >>> >     >>>>> sharing
> >>> >     >>>>>> group (the resource-specified-group).
> >>> >     >>>>>>    - Users can define different slot sharing groups for
> >>> >     operators
> >>> >     >> like
> >>> >     >>>>> they
> >>> >     >>>>>> do now, with the exception that you cannot mix operators
> >>> >     that have
> >>> >     >> a
> >>> >     >>>>>> resource profile and operators that have no resource
> profile.
> >>> >     >>>>>>    - The default case where no operator has a resource
> >>> >     profile is
> >>> >     >> just a
> >>> >     >>>>>> special case of this model
> >>> >     >>>>>>    - The chaining logic sums up the profiles per operator,
> >>> >     like it
> >>> >     >> does
> >>> >     >>>>> now,
> >>> >     >>>>>> and the scheduler sums up the profiles of the tasks that
> it
> >>> >     >> schedules
> >>> >     >>>>>> together.
> >>> >     >>>>>>
> >>> >     >>>>>>
> >>> >     >>>>>> There is another question about reactive scaling raised
> in the
> >>> >     >> FLIP. I
> >>> >     >>>>> need
> >>> >     >>>>>> to think a bit about that. That is indeed a bit more
> tricky
> >>> >     once we
> >>> >     >>>> have
> >>> >     >>>>>> slots of different sizes.
> >>> >     >>>>>> It is not clear then which of the different slot requests
> the
> >>> >     >>>>>> ResourceManager should fulfill when new resources (TMs)
> >>> >     show up,
> >>> >     >> or how
> >>> >     >>>>> the
> >>> >     >>>>>> JobManager redistributes the slots resources when
> resources
> >>> >     (TMs)
> >>> >     >>>>> disappear
> >>> >     >>>>>> This question is pretty orthogonal, though, to the "how to
> >>> >     specify
> >>> >     >> the
> >>> >     >>>>>> resources".
> >>> >     >>>>>>
> >>> >     >>>>>>
> >>> >     >>>>>> Best,
> >>> >     >>>>>> Stephan
> >>> >     >>>>>>
> >>> >     >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> >>> >     <tonysong...@gmail.com <mailto:tonysong...@gmail.com>
> >>> >     >>>>> wrote:
> >>> >     >>>>>>> Thanks for drafting the FLIP and driving the discussion,
> >>> >     Yangze.
> >>> >     >>>>>>> And Thanks for the feedback, Till and Chesnay.
> >>> >     >>>>>>>
> >>> >     >>>>>>> @Till,
> >>> >     >>>>>>>
> >>> >     >>>>>>> I agree that specifying requirements for SSGs means that
> SSGs
> >>> >     >> need to
> >>> >     >>>>> be
> >>> >     >>>>>>> supported in fine-grained resource management, otherwise
> each
> >>> >     >>>> operator
> >>> >     >>>>>>> might use as many resources as the whole group. However,
> I
> >>> >     cannot
> >>> >     >>>> think
> >>> >     >>>>>> of
> >>> >     >>>>>>> a strong reason for not supporting SSGs in fine-grained
> >>> >     resource
> >>> >     >>>>>>> management.
> >>> >     >>>>>>>
> >>> >     >>>>>>>
> >>> >     >>>>>>>> Interestingly, if all operators have their resources
> properly
> >>> >     >>>>>> specified,
> >>> >     >>>>>>>> then slot sharing is no longer needed because Flink
> could
> >>> >     >> slice off
> >>> >     >>>>> the
> >>> >     >>>>>>>> appropriately sized slots for every Task individually.
> >>> >     >>>>>>>>
> >>> >     >>>>>>> So for example, if we have a job consisting of two
> >>> >     operator op_1
> >>> >     >> and
> >>> >     >>>>> op_2
> >>> >     >>>>>>>> where each op needs 100 MB of memory, we would then say
> that
> >>> >     >> the
> >>> >     >>>> slot
> >>> >     >>>>>>>> sharing group needs 200 MB of memory to run. If we have
> a
> >>> >     >> cluster
> >>> >     >>>>> with
> >>> >     >>>>>> 2
> >>> >     >>>>>>>> TMs with one slot of 100 MB each, then the system
> cannot run
> >>> >     >> this
> >>> >     >>>>> job.
> >>> >     >>>>>> If
> >>> >     >>>>>>>> the resources were specified on an operator level, then
> the
> >>> >     >> system
> >>> >     >>>>>> could
> >>> >     >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2
> to
> >>> >     >> TM_2.
> >>> >     >>>>>>>
> >>> >     >>>>>>> Couldn't agree more that if all operators' requirements
> are
> >>> >     >> properly
> >>> >     >>>>>>> specified, slot sharing should be no longer needed. I
> >>> >     think this
> >>> >     >>>>> exactly
> >>> >     >>>>>>> disproves the example. If we already know op_1 and op_2
> each
> >>> >     >> needs
> >>> >     >>>> 100
> >>> >     >>>>> MB
> >>> >     >>>>>>> of memory, why would we put them in the same group? If
> >>> >     they are
> >>> >     >> in
> >>> >     >>>>>> separate
> >>> >     >>>>>>> groups, with the proposed approach the system can freely
> >>> >     deploy
> >>> >     >> them
> >>> >     >>>> to
> >>> >     >>>>>>> either a 200 MB TM or two 100 MB TMs.
> >>> >     >>>>>>>
> >>> >     >>>>>>> Moreover, the precondition for not needing slot sharing
> is
> >>> >     having
> >>> >     >>>>>> resource
> >>> >     >>>>>>> requirements properly specified for all operators. This
> is not
> >>> >     >> always
> >>> >     >>>>>>> possible, and usually requires tremendous efforts. One
> of the
> >>> >     >>>> benefits
> >>> >     >>>>>> for
> >>> >     >>>>>>> SSG-based requirements is that it allows the user to
> freely
> >>> >     >> decide
> >>> >     >>>> the
> >>> >     >>>>>>> granularity, thus efforts they want to pay. I would
> >>> >     consider SSG
> >>> >     >> in
> >>> >     >>>>>>> fine-grained resource management as a group of operators
> >>> >     that the
> >>> >     >>>> user
> >>> >     >>>>>>> would like to specify the total resource for. There can
> be
> >>> >     only
> >>> >     >> one
> >>> >     >>>>> group
> >>> >     >>>>>>> in the job, 2~3 groups dividing the job into a few major
> >>> >     parts,
> >>> >     >> or as
> >>> >     >>>>>> many
> >>> >     >>>>>>> groups as the number of tasks/operators, depending on how
> >>> >     >>>> fine-grained
> >>> >     >>>>>> the
> >>> >     >>>>>>> user is able to specify the resources.
> >>> >     >>>>>>>
> >>> >     >>>>>>> Having to support SSGs might be a constraint. But given
> >>> >     that all
> >>> >     >> the
> >>> >     >>>>>>> current scheduler implementations already support SSGs, I
> >>> >     tend to
> >>> >     >>>> think
> >>> >     >>>>>>> that as an acceptable price for the above discussed
> >>> >     usability and
> >>> >     >>>>>>> flexibility.
> >>> >     >>>>>>>
> >>> >     >>>>>>> @Chesnay
> >>> >     >>>>>>>
> >>> >     >>>>>>> Will declaring them on slot sharing groups not also waste
> >>> >     >> resources
> >>> >     >>>> if
> >>> >     >>>>>> the
> >>> >     >>>>>>>> parallelism of operators within that group are
> different?
> >>> >     >>>>>>>>
> >>> >     >>>>>>> Yes. It's a trade-off between usability and resource
> >>> >     >> utilization. To
> >>> >     >>>>>> avoid
> >>> >     >>>>>>> such wasting, the user can define more groups, so that
> >>> >     each group
> >>> >     >>>>>> contains
> >>> >     >>>>>>> less operators and the chance of having operators with
> >>> >     different
> >>> >     >>>>>>> parallelism will be reduced. The price is to have more
> >>> >     resource
> >>> >     >>>>>>> requirements to specify.
> >>> >     >>>>>>>
> >>> >     >>>>>>> It also seems like quite a hassle for users having to
> >>> >     >> recalculate the
> >>> >     >>>>>>>> resource requirements if they change the slot sharing.
> >>> >     >>>>>>>> I'd think that it's not really workable for users that
> create
> >>> >     >> a set
> >>> >     >>>>> of
> >>> >     >>>>>>>> re-usable operators which are mixed and matched in their
> >>> >     >>>>> applications;
> >>> >     >>>>>>>> managing the resources requirements in such a setting
> >>> >     would be
> >>> >     >> a
> >>> >     >>>>>>>> nightmare, and in the end would require operator-level
> >>> >     >> requirements
> >>> >     >>>>> any
> >>> >     >>>>>>>> way.
> >>> >     >>>>>>>> In that sense, I'm not even sure whether it really
> increases
> >>> >     >>>>> usability.
> >>> >     >>>>>>>     - As mentioned in my reply to Till's comment,
> there's no
> >>> >     >> reason to
> >>> >     >>>>> put
> >>> >     >>>>>>>     multiple operators whose individual resource
> >>> >     requirements are
> >>> >     >>>>> already
> >>> >     >>>>>>> known
> >>> >     >>>>>>>     into the same group in fine-grained resource
> management.
> >>> >     >>>>>>>     - Even an operator implementation is reused for
> multiple
> >>> >     >>>>> applications,
> >>> >     >>>>>>>     it does not guarantee the same resource requirements.
> >>> >     During
> >>> >     >> our
> >>> >     >>>>> years
> >>> >     >>>>>>> of
> >>> >     >>>>>>>     practices in Alibaba, with per-operator requirements
> >>> >     >> specified for
> >>> >     >>>>>>> Blink's
> >>> >     >>>>>>>     fine-grained resource management, very few users
> >>> >     (including
> >>> >     >> our
> >>> >     >>>>>>> specialists
> >>> >     >>>>>>>     who are dedicated to supporting Blink users) are as
> >>> >     >> experienced as
> >>> >     >>>>> to
> >>> >     >>>>>>>     accurately predict/estimate the operator resource
> >>> >     >> requirements.
> >>> >     >>>> Most
> >>> >     >>>>>>> people
> >>> >     >>>>>>>     rely on the execution-time metrics (throughput,
> delay, cpu
> >>> >     >> load,
> >>> >     >>>>>> memory
> >>> >     >>>>>>>     usage, GC pressure, etc.) to improve the
> specification.
> >>> >     >>>>>>>
> >>> >     >>>>>>> To sum up:
> >>> >     >>>>>>> If the user is capable of providing proper resource
> >>> >     requirements
> >>> >     >> for
> >>> >     >>>>>> every
> >>> >     >>>>>>> operator, that's definitely a good thing and we would not
> >>> >     need to
> >>> >     >>>> rely
> >>> >     >>>>> on
> >>> >     >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> >>> >     >> fine-grained
> >>> >     >>>>>> resource
> >>> >     >>>>>>> management to work. For those users who are capable and
> do not
> >>> >     >> like
> >>> >     >>>>>> having
> >>> >     >>>>>>> to set each operator to a separate SSG, I would be ok to
> have
> >>> >     >> both
> >>> >     >>>>>>> SSG-based and operator-based runtime interfaces and to
> only
> >>> >     >> fallback
> >>> >     >>>> to
> >>> >     >>>>>> the
> >>> >     >>>>>>> SSG requirements when the operator requirements are not
> >>> >     >> specified.
> >>> >     >>>>>> However,
> >>> >     >>>>>>> as the first step, I think we should prioritise the use
> cases
> >>> >     >> where
> >>> >     >>>>> users
> >>> >     >>>>>>> are not that experienced.
> >>> >     >>>>>>>
> >>> >     >>>>>>> Thank you~
> >>> >     >>>>>>>
> >>> >     >>>>>>> Xintong Song
> >>> >     >>>>>>>
> >>> >     >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> >>> >     >> ches...@apache.org <mailto:ches...@apache.org>>
> >>> >     >>>>>>> wrote:
> >>> >     >>>>>>>
> >>> >     >>>>>>>> Will declaring them on slot sharing groups not also
> waste
> >>> >     >> resources
> >>> >     >>>>> if
> >>> >     >>>>>>>> the parallelism of operators within that group are
> different?
> >>> >     >>>>>>>>
> >>> >     >>>>>>>> It also seems like quite a hassle for users having to
> >>> >     >> recalculate
> >>> >     >>>> the
> >>> >     >>>>>>>> resource requirements if they change the slot sharing.
> >>> >     >>>>>>>> I'd think that it's not really workable for users that
> create
> >>> >     >> a set
> >>> >     >>>>> of
> >>> >     >>>>>>>> re-usable operators which are mixed and matched in their
> >>> >     >>>>> applications;
> >>> >     >>>>>>>> managing the resources requirements in such a setting
> >>> >     would be
> >>> >     >> a
> >>> >     >>>>>>>> nightmare, and in the end would require operator-level
> >>> >     >> requirements
> >>> >     >>>>> any
> >>> >     >>>>>>>> way.
> >>> >     >>>>>>>> In that sense, I'm not even sure whether it really
> increases
> >>> >     >>>>> usability.
> >>> >     >>>>>>>> My main worry is that it if we wire the runtime to work
> >>> >     on SSGs
> >>> >     >>>> it's
> >>> >     >>>>>>>> gonna be difficult to implement more fine-grained
> approaches,
> >>> >     >> which
> >>> >     >>>>>>>> would not be the case if, for the runtime, they are
> always
> >>> >     >> defined
> >>> >     >>>> on
> >>> >     >>>>>> an
> >>> >     >>>>>>>> operator-level.
> >>> >     >>>>>>>>
> >>> >     >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> >>> >     >>>>>>>>> Thanks for drafting this FLIP and starting this
> discussion
> >>> >     >>>> Yangze.
> >>> >     >>>>>>>>> I like that defining resource requirements on a slot
> sharing
> >>> >     >>>> group
> >>> >     >>>>>>> makes
> >>> >     >>>>>>>>> the overall setup easier and improves usability of
> resource
> >>> >     >>>>>>> requirements.
> >>> >     >>>>>>>>> What I do not like about it is that it changes slot
> sharing
> >>> >     >>>> groups
> >>> >     >>>>>> from
> >>> >     >>>>>>>>> being a scheduling hint to something which needs to be
> >>> >     >> supported
> >>> >     >>>> in
> >>> >     >>>>>>> order
> >>> >     >>>>>>>>> to support fine grained resource requirements. So far,
> the
> >>> >     >> idea
> >>> >     >>>> of
> >>> >     >>>>>> slot
> >>> >     >>>>>>>>> sharing groups was that it tells the system that a set
> of
> >>> >     >>>> operators
> >>> >     >>>>>> can
> >>> >     >>>>>>>> be
> >>> >     >>>>>>>>> deployed in the same slot. But the system still had the
> >>> >     >> freedom
> >>> >     >>>> to
> >>> >     >>>>>> say
> >>> >     >>>>>>>> that
> >>> >     >>>>>>>>> it would rather place these tasks in different slots
> if it
> >>> >     >>>> wanted.
> >>> >     >>>>> If
> >>> >     >>>>>>> we
> >>> >     >>>>>>>>> now specify resource requirements on a per slot sharing
> >>> >     >> group,
> >>> >     >>>> then
> >>> >     >>>>>> the
> >>> >     >>>>>>>>> only option for a scheduler which does not support slot
> >>> >     >> sharing
> >>> >     >>>>>> groups
> >>> >     >>>>>>> is
> >>> >     >>>>>>>>> to say that every operator in this slot sharing group
> >>> >     needs a
> >>> >     >>>> slot
> >>> >     >>>>>> with
> >>> >     >>>>>>>> the
> >>> >     >>>>>>>>> same resources as the whole group.
> >>> >     >>>>>>>>>
> >>> >     >>>>>>>>> So for example, if we have a job consisting of two
> operator
> >>> >     >> op_1
> >>> >     >>>>> and
> >>> >     >>>>>>> op_2
> >>> >     >>>>>>>>> where each op needs 100 MB of memory, we would then
> say that
> >>> >     >> the
> >>> >     >>>>> slot
> >>> >     >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> have a
> >>> >     >> cluster
> >>> >     >>>>>> with
> >>> >     >>>>>>> 2
> >>> >     >>>>>>>>> TMs with one slot of 100 MB each, then the system
> cannot run
> >>> >     >> this
> >>> >     >>>>>> job.
> >>> >     >>>>>>> If
> >>> >     >>>>>>>>> the resources were specified on an operator level,
> then the
> >>> >     >>>> system
> >>> >     >>>>>>> could
> >>> >     >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> op_2 to
> >>> >     >> TM_2.
> >>> >     >>>>>>>>> Originally, one of the primary goals of slot sharing
> groups
> >>> >     >> was
> >>> >     >>>> to
> >>> >     >>>>>> make
> >>> >     >>>>>>>> it
> >>> >     >>>>>>>>> easier for the user to reason about how many slots a
> job
> >>> >     >> needs
> >>> >     >>>>>>>> independent
> >>> >     >>>>>>>>> of the actual number of operators in the job.
> Interestingly,
> >>> >     >> if
> >>> >     >>>> all
> >>> >     >>>>>>>>> operators have their resources properly specified,
> then slot
> >>> >     >>>>> sharing
> >>> >     >>>>>> is
> >>> >     >>>>>>>> no
> >>> >     >>>>>>>>> longer needed because Flink could slice off the
> >>> >     appropriately
> >>> >     >>>> sized
> >>> >     >>>>>>> slots
> >>> >     >>>>>>>>> for every Task individually. What matters is whether
> the
> >>> >     >> whole
> >>> >     >>>>>> cluster
> >>> >     >>>>>>>> has
> >>> >     >>>>>>>>> enough resources to run all tasks or not.
> >>> >     >>>>>>>>>
> >>> >     >>>>>>>>> Cheers,
> >>> >     >>>>>>>>> Till
> >>> >     >>>>>>>>>
> >>> >     >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> >>> >     >> karma...@gmail.com <mailto:karma...@gmail.com>>
> >>> >     >>>>>> wrote:
> >>> >     >>>>>>>>>> Hi, there,
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>>> We would like to start a discussion thread on
> "FLIP-156:
> >>> >     >> Runtime
> >>> >     >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
> >>> >     >> where we
> >>> >     >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> interfaces
> >>> >     >> for
> >>> >     >>>>>>>>>> specifying fine-grained resource requirements.
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>>> In this FLIP:
> >>> >     >>>>>>>>>> - Expound the user story of fine-grained resource
> >>> >     >> management.
> >>> >     >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
> >>> >     >> resource
> >>> >     >>>>>>>>>> requirements.
> >>> >     >>>>>>>>>> - Discuss the pros and cons of the three potential
> >>> >     >> granularities
> >>> >     >>>>> for
> >>> >     >>>>>>>>>> specifying the resource requirements (op, task and
> slot
> >>> >     >> sharing
> >>> >     >>>>>> group)
> >>> >     >>>>>>>>>> and explain why we choose the slot sharing group.
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>>> Please find more details in the FLIP wiki document
> [1].
> >>> >     >> Looking
> >>> >     >>>>>>>>>> forward to your feedback.
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>>> [1]
> >>> >     >>>>>>>>>>
> >>> >     >>
> >>> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >>> >     <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >
> >>> >     >>>>>>>>>> Best,
> >>> >     >>>>>>>>>> Yangze Guo
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>
> >>> >
> >>>
>

Reply via email to