Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Yangze Guo Wed, 20 Jan 2021 18:18:04 -0800

@Till

Also +1 to treat the SSG resource requirements as a hint instead of a
restrict. We can treat it as a follow-up effort and make it clear in
JavaDocs at the first step.


Best,
Yangze Guo

On Thu, Jan 21, 2021 at 10:00 AM Xintong Song <tonysong...@gmail.com> wrote:
>
> I think this makes sense.
>
> The semantic of a SSG is that operators in the group *can* be scheduled
> together in a slot, which is not a *must*. Specifying resources for SSGs
> should not change that semantic. In cases that needs for scheduling the
> operators into different slots arise, it makes sense for the runtime to
> derive the finer grained resource requirements, if not provided.
>
> We may not need to implement this at the moment since currently SSGs are
> always respected, but we should make that semantic explicit in JavaDocs for
> the interfaces and user documentations when the user APIs are exposed.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Jan 21, 2021 at 1:55 AM Till Rohrmann <trohrm...@apache.org> wrote:
>
> > Maybe a different minor idea: Would it be possible to treat the SSG
> > resource requirements as a hint for the runtime similar to how slot sharing
> > groups are designed at the moment? Meaning that we don't give the guarantee
> > that Flink will always deploy this set of tasks together no matter what
> > comes. If, for example, the runtime can derive by some means the resource
> > requirements for each task based on the requirements for the SSG, this
> > could be possible. One easy strategy would be to give every task the same
> > resources as the whole slot sharing group. Another one could be
> > distributing the resources equally among the tasks. This does not even have
> > to be implemented but we would give ourselves the freedom to change
> > scheduling if need should arise.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <karma...@gmail.com> wrote:
> >
> > > Thanks for the responses, Till and Xintong.
> > >
> > > I second Xintong's comment that SSG-based runtime interface will give
> > > us the flexibility to achieve op/task-based approach. That's one of
> > > the most important reasons for our design choice.
> > >
> > > Some cents regarding the default operator resource:
> > > - It might be good for the scenario of DataStream jobs.
> > >    ** For light-weight operators, the accumulative configuration error
> > > will not be significant. Then, the resource of a task used is
> > > proportional to the number of operators it contains.
> > >    ** For heavy operators like join and window or operators using the
> > > external resources, user will turn to the fine-grained resource
> > > configuration.
> > > - It can increase the stability for the standalone cluster where task
> > > executors registered are heterogeneous(with different default slot
> > > resources).
> > > - It might not be good for SQL users. The operators that SQL will be
> > > transferred to is a black box to the user. We also do not guarantee
> > > the cross-version of consistency of the transformation so far.
> > >
> > > I think it can be treated as a follow-up work when the fine-grained
> > > resource management is end-to-end ready.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > >
> > > On Wed, Jan 20, 2021 at 11:16 AM Xintong Song <tonysong...@gmail.com>
> > > wrote:
> > > >
> > > > Thanks for the feedback, Till.
> > > >
> > > > ## I feel that what you proposed (operator-based + default value) might
> > > be
> > > > subsumed by the SSG-based approach.
> > > > Thinking of op_1 -> op_2, there are the following 4 cases, categorized
> > by
> > > > whether the resource requirements are known to the users.
> > > >
> > > >    1. *Both known.* As previously mentioned, there's no reason to put
> > > >    multiple operators whose individual resource requirements are
> > already
> > > known
> > > >    into the same group in fine-grained resource management. And if op_1
> > > and
> > > >    op_2 are in different groups, there should be no problem switching
> > > data
> > > >    exchange mode from pipelined to blocking. This is equivalent to
> > > specifying
> > > >    operator resource requirements in your proposal.
> > > >    2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is
> > in a
> > > >    SSG whose resource is not specified thus would have the default slot
> > > >    resource. This is equivalent to having default operator resources in
> > > your
> > > >    proposal.
> > > >    3. *Both unknown*. The user can either set op_1 and op_2 to the same
> > > SSG
> > > >    or separate SSGs.
> > > >       - If op_1 and op_2 are in the same SSG, it will be equivalent to
> > > the
> > > >       coarse-grained resource management, where op_1 and op_2 share a
> > > default
> > > >       size slot no matter which data exchange mode is used.
> > > >       - If op_1 and op_2 are in different SSGs, then each of them will
> > > use
> > > >       a default size slot. This is equivalent to setting them with
> > > default
> > > >       operator resources in your proposal.
> > > >    4. *Total (pipeline) or max (blocking) of op_1 and op_2 is known.*
> > > >       - It is possible that the user learns the total / max resource
> > > >       requirement from executing and monitoring the job, while not
> > > > being aware of
> > > >       individual operator requirements.
> > > >       - I believe this is the case your proposal does not cover. And
> > TBH,
> > > >       this is probably how most users learn the resource requirements,
> > > > according
> > > >       to my experiences.
> > > >       - In this case, the user might need to specify different
> > resources
> > > if
> > > >       he wants to switch the execution mode, which should not be worse
> > > than not
> > > >       being able to use fine-grained resource management.
> > > >
> > > >
> > > > ## An additional idea inspired by your proposal.
> > > > We may provide multiple options for deciding resources for SSGs whose
> > > > requirement is not specified, if needed.
> > > >
> > > >    - Default slot resource (current design)
> > > >    - Default operator resource times number of operators (equivalent to
> > > >    your proposal)
> > > >
> > > >
> > > > ## Exposing internal runtime strategies
> > > > Theoretically, yes. Tying to the SSGs, the resource requirements might
> > be
> > > > affected if how SSGs are internally handled changes in future.
> > > Practically,
> > > > I do not concretely see at the moment what kind of changes we may want
> > in
> > > > future that might conflict with this FLIP proposal, as the question of
> > > > switching data exchange mode answered above. I'd suggest to not give up
> > > the
> > > > user friendliness we may gain now for the future problems that may or
> > may
> > > > not exist.
> > > >
> > > > Moreover, the SSG-based approach has the flexibility to achieve the
> > > > equivalent behavior as the operator-based approach, if we set each
> > > operator
> > > > (or task) to a separate SSG. We can even provide a shortcut option to
> > > > automatically do that for users, if needed.
> > > >
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <trohrm...@apache.org>
> > > wrote:
> > > >
> > > > > Thanks for the responses Xintong and Stephan,
> > > > >
> > > > > I agree that being able to define the resource requirements for a
> > > group of
> > > > > operators is more user friendly. However, my concern is that we are
> > > > > exposing thereby internal runtime strategies which might limit our
> > > > > flexibility to execute a given job. Moreover, the semantics of
> > > configuring
> > > > > resource requirements for SSGs could break if switching from
> > streaming
> > > to
> > > > > batch execution. If one defines the resource requirements for op_1 ->
> > > op_2
> > > > > which run in pipelined mode when using the streaming execution, then
> > > how do
> > > > > we interpret these requirements when op_1 -> op_2 are executed with a
> > > > > blocking data exchange in batch execution mode? Consequently, I am
> > > still
> > > > > leaning towards Stephan's proposal to set the resource requirements
> > per
> > > > > operator.
> > > > >
> > > > > Maybe the following proposal makes the configuration easier: If the
> > > user
> > > > > wants to use fine-grained resource requirements, then she needs to
> > > specify
> > > > > the default size which is used for operators which have no explicit
> > > > > resource annotation. If this holds true, then every operator would
> > > have a
> > > > > resource requirement and the system can try to execute the operators
> > > in the
> > > > > best possible manner w/o being constrained by how the user set the
> > SSG
> > > > > requirements.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <tonysong...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the feedback, Stephan.
> > > > > >
> > > > > > Actually, your proposal has also come to my mind at some point.
> > And I
> > > > > have
> > > > > > some concerns about it.
> > > > > >
> > > > > >
> > > > > > 1. It does not give users the same control as the SSG-based
> > approach.
> > > > > >
> > > > > >
> > > > > > While both approaches do not require specifying for each operator,
> > > > > > SSG-based approach supports the semantic that "some operators
> > > together
> > > > > use
> > > > > > this much resource" while the operator-based approach doesn't.
> > > > > >
> > > > > >
> > > > > > Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and
> > > at
> > > > > some
> > > > > > point there's an agg o_n (1 < n < m) which significantly reduces
> > the
> > > data
> > > > > > amount. One can separate the pipeline into 2 groups SSG_1 (o_1,
> > ...,
> > > o_n)
> > > > > > and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> > > parallelisms
> > > > > > for operators in SSG_1 than for operators in SSG_2 won't lead to
> > too
> > > much
> > > > > > wasting of resources. If the two SSGs end up needing different
> > > resources,
> > > > > > with the SSG-based approach one can directly specify resources for
> > > the
> > > > > two
> > > > > > groups. However, with the operator-based approach, the user will
> > > have to
> > > > > > specify resources for each operator in one of the two groups, and
> > > tune
> > > > > the
> > > > > > default slot resource via configurations to fit the other group.
> > > > > >
> > > > > >
> > > > > > 2. It increases the chance of breaking operator chains.
> > > > > >
> > > > > >
> > > > > > Setting chainnable operators into different slot sharing groups
> > will
> > > > > > prevent them from being chained. In the current implementation,
> > > > > downstream
> > > > > > operators, if SSG not explicitly specified, will be set to the same
> > > group
> > > > > > as the chainable upstream operators (unless multiple upstream
> > > operators
> > > > > in
> > > > > > different groups), to reduce the chance of breaking chains.
> > > > > >
> > > > > >
> > > > > > Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding
> > > SSGs
> > > > > > based on whether resource is specified we will easily get groups
> > like
> > > > > (o_1,
> > > > > > o_3) & (o_2, o_4), where none of the operators can be chained. This
> > > is
> > > > > also
> > > > > > possible for the SSG-based approach, but I believe the chance is
> > much
> > > > > > smaller because there's no strong reason for users to specify the
> > > groups
> > > > > > with alternate operators like that. We are more likely to get
> > groups
> > > like
> > > > > > (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2
> > and
> > > o_3.
> > > > > >
> > > > > >
> > > > > > 3. It complicates the system by having two different mechanisms for
> > > > > sharing
> > > > > > managed memory in  a slot.
> > > > > >
> > > > > >
> > > > > > - In FLIP-141, we introduced the intra-slot managed memory sharing
> > > > > > mechanism, where managed memory is first distributed according to
> > the
> > > > > > consumer type, then further distributed across operators of that
> > > consumer
> > > > > > type.
> > > > > >
> > > > > > - With the operator-based approach, managed memory size specified
> > > for an
> > > > > > operator should account for all the consumer types of that
> > operator.
> > > That
> > > > > > means the managed memory is first distributed across operators,
> > then
> > > > > > distributed to different consumer types of each operator.
> > > > > >
> > > > > >
> > > > > > Unfortunately, the different order of the two calculation steps can
> > > lead
> > > > > to
> > > > > > different results. To be specific, the semantic of the
> > configuration
> > > > > option
> > > > > > `consumer-weights` changed (within a slot vs. within an operator).
> > > > > >
> > > > > >
> > > > > >
> > > > > > To sum up things:
> > > > > >
> > > > > > While (3) might be a bit more implementation related, I think (1)
> > > and (2)
> > > > > > somehow suggest that, the price for the proposed approach to avoid
> > > > > > specifying resource for every operator is that it's not as
> > > independent
> > > > > from
> > > > > > operator chaining and slot sharing as the operator-based approach
> > > > > discussed
> > > > > > in the FLIP.
> > > > > >
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Thanks a lot, Yangze and Xintong for this FLIP.
> > > > > > >
> > > > > > > I want to say, first of all, that this is super well written. And
> > > the
> > > > > > > points that the FLIP makes about how to expose the configuration
> > to
> > > > > users
> > > > > > > is exactly the right thing to figure out first.
> > > > > > > So good job here!
> > > > > > >
> > > > > > > About how to let users specify the resource profiles. If I can
> > sum
> > > the
> > > > > > FLIP
> > > > > > > and previous discussion up in my own words, the problem is the
> > > > > following:
> > > > > > >
> > > > > > > Operator-level specification is the simplest and cleanest
> > approach,
> > > > > > because
> > > > > > > > it avoids mixing operator configuration (resource) and
> > > scheduling. No
> > > > > > > > matter what other parameters change (chaining, slot sharing,
> > > > > switching
> > > > > > > > pipelined and blocking shuffles), the resource profiles stay
> > the
> > > > > same.
> > > > > > > > But it would require that a user specifies resources on all
> > > > > operators,
> > > > > > > > which makes it hard to use. That's why the FLIP suggests going
> > > with
> > > > > > > > specifying resources on a Sharing-Group.
> > > > > > >
> > > > > > >
> > > > > > > I think both thoughts are important, so can we find a solution
> > > where
> > > > > the
> > > > > > > Resource Profiles are specified on an Operator, but we still
> > avoid
> > > that
> > > > > > we
> > > > > > > need to specify a resource profile on every operator?
> > > > > > >
> > > > > > > What do you think about something like the following:
> > > > > > >   - Resource Profiles are specified on an operator level.
> > > > > > >   - Not all operators need profiles
> > > > > > >   - All Operators without a Resource Profile ended up in the
> > > default
> > > > > slot
> > > > > > > sharing group with a default profile (will get a default slot).
> > > > > > >   - All Operators with a Resource Profile will go into another
> > slot
> > > > > > sharing
> > > > > > > group (the resource-specified-group).
> > > > > > >   - Users can define different slot sharing groups for operators
> > > like
> > > > > > they
> > > > > > > do now, with the exception that you cannot mix operators that
> > have
> > > a
> > > > > > > resource profile and operators that have no resource profile.
> > > > > > >   - The default case where no operator has a resource profile is
> > > just a
> > > > > > > special case of this model
> > > > > > >   - The chaining logic sums up the profiles per operator, like it
> > > does
> > > > > > now,
> > > > > > > and the scheduler sums up the profiles of the tasks that it
> > > schedules
> > > > > > > together.
> > > > > > >
> > > > > > >
> > > > > > > There is another question about reactive scaling raised in the
> > > FLIP. I
> > > > > > need
> > > > > > > to think a bit about that. That is indeed a bit more tricky once
> > we
> > > > > have
> > > > > > > slots of different sizes.
> > > > > > > It is not clear then which of the different slot requests the
> > > > > > > ResourceManager should fulfill when new resources (TMs) show up,
> > > or how
> > > > > > the
> > > > > > > JobManager redistributes the slots resources when resources (TMs)
> > > > > > disappear
> > > > > > > This question is pretty orthogonal, though, to the "how to
> > specify
> > > the
> > > > > > > resources".
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > > > On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <
> > tonysong...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for drafting the FLIP and driving the discussion,
> > Yangze.
> > > > > > > > And Thanks for the feedback, Till and Chesnay.
> > > > > > > >
> > > > > > > > @Till,
> > > > > > > >
> > > > > > > > I agree that specifying requirements for SSGs means that SSGs
> > > need to
> > > > > > be
> > > > > > > > supported in fine-grained resource management, otherwise each
> > > > > operator
> > > > > > > > might use as many resources as the whole group. However, I
> > cannot
> > > > > think
> > > > > > > of
> > > > > > > > a strong reason for not supporting SSGs in fine-grained
> > resource
> > > > > > > > management.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Interestingly, if all operators have their resources properly
> > > > > > > specified,
> > > > > > > > > then slot sharing is no longer needed because Flink could
> > > slice off
> > > > > > the
> > > > > > > > > appropriately sized slots for every Task individually.
> > > > > > > > >
> > > > > > > >
> > > > > > > > So for example, if we have a job consisting of two operator
> > op_1
> > > and
> > > > > > op_2
> > > > > > > > > where each op needs 100 MB of memory, we would then say that
> > > the
> > > > > slot
> > > > > > > > > sharing group needs 200 MB of memory to run. If we have a
> > > cluster
> > > > > > with
> > > > > > > 2
> > > > > > > > > TMs with one slot of 100 MB each, then the system cannot run
> > > this
> > > > > > job.
> > > > > > > If
> > > > > > > > > the resources were specified on an operator level, then the
> > > system
> > > > > > > could
> > > > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> > > TM_2.
> > > > > > > >
> > > > > > > >
> > > > > > > > Couldn't agree more that if all operators' requirements are
> > > properly
> > > > > > > > specified, slot sharing should be no longer needed. I think
> > this
> > > > > > exactly
> > > > > > > > disproves the example. If we already know op_1 and op_2 each
> > > needs
> > > > > 100
> > > > > > MB
> > > > > > > > of memory, why would we put them in the same group? If they are
> > > in
> > > > > > > separate
> > > > > > > > groups, with the proposed approach the system can freely deploy
> > > them
> > > > > to
> > > > > > > > either a 200 MB TM or two 100 MB TMs.
> > > > > > > >
> > > > > > > > Moreover, the precondition for not needing slot sharing is
> > having
> > > > > > > resource
> > > > > > > > requirements properly specified for all operators. This is not
> > > always
> > > > > > > > possible, and usually requires tremendous efforts. One of the
> > > > > benefits
> > > > > > > for
> > > > > > > > SSG-based requirements is that it allows the user to freely
> > > decide
> > > > > the
> > > > > > > > granularity, thus efforts they want to pay. I would consider
> > SSG
> > > in
> > > > > > > > fine-grained resource management as a group of operators that
> > the
> > > > > user
> > > > > > > > would like to specify the total resource for. There can be only
> > > one
> > > > > > group
> > > > > > > > in the job, 2~3 groups dividing the job into a few major parts,
> > > or as
> > > > > > > many
> > > > > > > > groups as the number of tasks/operators, depending on how
> > > > > fine-grained
> > > > > > > the
> > > > > > > > user is able to specify the resources.
> > > > > > > >
> > > > > > > > Having to support SSGs might be a constraint. But given that
> > all
> > > the
> > > > > > > > current scheduler implementations already support SSGs, I tend
> > to
> > > > > think
> > > > > > > > that as an acceptable price for the above discussed usability
> > and
> > > > > > > > flexibility.
> > > > > > > >
> > > > > > > > @Chesnay
> > > > > > > >
> > > > > > > > Will declaring them on slot sharing groups not also waste
> > > resources
> > > > > if
> > > > > > > the
> > > > > > > > > parallelism of operators within that group are different?
> > > > > > > > >
> > > > > > > > Yes. It's a trade-off between usability and resource
> > > utilization. To
> > > > > > > avoid
> > > > > > > > such wasting, the user can define more groups, so that each
> > group
> > > > > > > contains
> > > > > > > > less operators and the chance of having operators with
> > different
> > > > > > > > parallelism will be reduced. The price is to have more resource
> > > > > > > > requirements to specify.
> > > > > > > >
> > > > > > > > It also seems like quite a hassle for users having to
> > > recalculate the
> > > > > > > > > resource requirements if they change the slot sharing.
> > > > > > > > > I'd think that it's not really workable for users that create
> > > a set
> > > > > > of
> > > > > > > > > re-usable operators which are mixed and matched in their
> > > > > > applications;
> > > > > > > > > managing the resources requirements in such a setting would
> > be
> > > a
> > > > > > > > > nightmare, and in the end would require operator-level
> > > requirements
> > > > > > any
> > > > > > > > > way.
> > > > > > > > > In that sense, I'm not even sure whether it really increases
> > > > > > usability.
> > > > > > > > >
> > > > > > > >
> > > > > > > >    - As mentioned in my reply to Till's comment, there's no
> > > reason to
> > > > > > put
> > > > > > > >    multiple operators whose individual resource requirements
> > are
> > > > > > already
> > > > > > > > known
> > > > > > > >    into the same group in fine-grained resource management.
> > > > > > > >    - Even an operator implementation is reused for multiple
> > > > > > applications,
> > > > > > > >    it does not guarantee the same resource requirements. During
> > > our
> > > > > > years
> > > > > > > > of
> > > > > > > >    practices in Alibaba, with per-operator requirements
> > > specified for
> > > > > > > > Blink's
> > > > > > > >    fine-grained resource management, very few users (including
> > > our
> > > > > > > > specialists
> > > > > > > >    who are dedicated to supporting Blink users) are as
> > > experienced as
> > > > > > to
> > > > > > > >    accurately predict/estimate the operator resource
> > > requirements.
> > > > > Most
> > > > > > > > people
> > > > > > > >    rely on the execution-time metrics (throughput, delay, cpu
> > > load,
> > > > > > > memory
> > > > > > > >    usage, GC pressure, etc.) to improve the specification.
> > > > > > > >
> > > > > > > > To sum up:
> > > > > > > > If the user is capable of providing proper resource
> > requirements
> > > for
> > > > > > > every
> > > > > > > > operator, that's definitely a good thing and we would not need
> > to
> > > > > rely
> > > > > > on
> > > > > > > > the SSGs. However, that shouldn't be a *must* for the
> > > fine-grained
> > > > > > > resource
> > > > > > > > management to work. For those users who are capable and do not
> > > like
> > > > > > > having
> > > > > > > > to set each operator to a separate SSG, I would be ok to have
> > > both
> > > > > > > > SSG-based and operator-based runtime interfaces and to only
> > > fallback
> > > > > to
> > > > > > > the
> > > > > > > > SSG requirements when the operator requirements are not
> > > specified.
> > > > > > > However,
> > > > > > > > as the first step, I think we should prioritise the use cases
> > > where
> > > > > > users
> > > > > > > > are not that experienced.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > ches...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Will declaring them on slot sharing groups not also waste
> > > resources
> > > > > > if
> > > > > > > > > the parallelism of operators within that group are different?
> > > > > > > > >
> > > > > > > > > It also seems like quite a hassle for users having to
> > > recalculate
> > > > > the
> > > > > > > > > resource requirements if they change the slot sharing.
> > > > > > > > > I'd think that it's not really workable for users that create
> > > a set
> > > > > > of
> > > > > > > > > re-usable operators which are mixed and matched in their
> > > > > > applications;
> > > > > > > > > managing the resources requirements in such a setting would
> > be
> > > a
> > > > > > > > > nightmare, and in the end would require operator-level
> > > requirements
> > > > > > any
> > > > > > > > > way.
> > > > > > > > > In that sense, I'm not even sure whether it really increases
> > > > > > usability.
> > > > > > > > >
> > > > > > > > > My main worry is that it if we wire the runtime to work on
> > SSGs
> > > > > it's
> > > > > > > > > gonna be difficult to implement more fine-grained approaches,
> > > which
> > > > > > > > > would not be the case if, for the runtime, they are always
> > > defined
> > > > > on
> > > > > > > an
> > > > > > > > > operator-level.
> > > > > > > > >
> > > > > > > > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > > > > > > Thanks for drafting this FLIP and starting this discussion
> > > > > Yangze.
> > > > > > > > > >
> > > > > > > > > > I like that defining resource requirements on a slot
> > sharing
> > > > > group
> > > > > > > > makes
> > > > > > > > > > the overall setup easier and improves usability of resource
> > > > > > > > requirements.
> > > > > > > > > >
> > > > > > > > > > What I do not like about it is that it changes slot sharing
> > > > > groups
> > > > > > > from
> > > > > > > > > > being a scheduling hint to something which needs to be
> > > supported
> > > > > in
> > > > > > > > order
> > > > > > > > > > to support fine grained resource requirements. So far, the
> > > idea
> > > > > of
> > > > > > > slot
> > > > > > > > > > sharing groups was that it tells the system that a set of
> > > > > operators
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > deployed in the same slot. But the system still had the
> > > freedom
> > > > > to
> > > > > > > say
> > > > > > > > > that
> > > > > > > > > > it would rather place these tasks in different slots if it
> > > > > wanted.
> > > > > > If
> > > > > > > > we
> > > > > > > > > > now specify resource requirements on a per slot sharing
> > > group,
> > > > > then
> > > > > > > the
> > > > > > > > > > only option for a scheduler which does not support slot
> > > sharing
> > > > > > > groups
> > > > > > > > is
> > > > > > > > > > to say that every operator in this slot sharing group
> > needs a
> > > > > slot
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > same resources as the whole group.
> > > > > > > > > >
> > > > > > > > > > So for example, if we have a job consisting of two operator
> > > op_1
> > > > > > and
> > > > > > > > op_2
> > > > > > > > > > where each op needs 100 MB of memory, we would then say
> > that
> > > the
> > > > > > slot
> > > > > > > > > > sharing group needs 200 MB of memory to run. If we have a
> > > cluster
> > > > > > > with
> > > > > > > > 2
> > > > > > > > > > TMs with one slot of 100 MB each, then the system cannot
> > run
> > > this
> > > > > > > job.
> > > > > > > > If
> > > > > > > > > > the resources were specified on an operator level, then the
> > > > > system
> > > > > > > > could
> > > > > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> > > TM_2.
> > > > > > > > > >
> > > > > > > > > > Originally, one of the primary goals of slot sharing groups
> > > was
> > > > > to
> > > > > > > make
> > > > > > > > > it
> > > > > > > > > > easier for the user to reason about how many slots a job
> > > needs
> > > > > > > > > independent
> > > > > > > > > > of the actual number of operators in the job.
> > Interestingly,
> > > if
> > > > > all
> > > > > > > > > > operators have their resources properly specified, then
> > slot
> > > > > > sharing
> > > > > > > is
> > > > > > > > > no
> > > > > > > > > > longer needed because Flink could slice off the
> > appropriately
> > > > > sized
> > > > > > > > slots
> > > > > > > > > > for every Task individually. What matters is whether the
> > > whole
> > > > > > > cluster
> > > > > > > > > has
> > > > > > > > > > enough resources to run all tasks or not.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > karma...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi, there,
> > > > > > > > > >>
> > > > > > > > > >> We would like to start a discussion thread on "FLIP-156:
> > > Runtime
> > > > > > > > > >> Interfaces for Fine-Grained Resource Requirements"[1],
> > > where we
> > > > > > > > > >> propose Slot Sharing Group (SSG) based runtime interfaces
> > > for
> > > > > > > > > >> specifying fine-grained resource requirements.
> > > > > > > > > >>
> > > > > > > > > >> In this FLIP:
> > > > > > > > > >> - Expound the user story of fine-grained resource
> > > management.
> > > > > > > > > >> - Propose runtime interfaces for specifying SSG-based
> > > resource
> > > > > > > > > >> requirements.
> > > > > > > > > >> - Discuss the pros and cons of the three potential
> > > granularities
> > > > > > for
> > > > > > > > > >> specifying the resource requirements (op, task and slot
> > > sharing
> > > > > > > group)
> > > > > > > > > >> and explain why we choose the slot sharing group.
> > > > > > > > > >>
> > > > > > > > > >> Please find more details in the FLIP wiki document [1].
> > > Looking
> > > > > > > > > >> forward to your feedback.
> > > > > > > > > >>
> > > > > > > > > >> [1]
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > > > > > >>
> > > > > > > > > >> Best,
> > > > > > > > > >> Yangze Guo
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Reply via email to