Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Till Rohrmann Thu, 03 Sep 2020 05:44:59 -0700

Thanks for updating the FLIP Xintong. It looks good to me. One minor
comment is that we could name the configuration parameter
also taskmanager.memory.managed.consumer-weights which might be a bit more
expressive what this option does.


Cheers,
Till

On Thu, Sep 3, 2020 at 12:44 PM Xintong Song <[email protected]> wrote:

> Thanks all for the feedback.
>
> FYI, I've opened a voting thread[1] on this.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html
>
>
> On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <[email protected]> wrote:
>
> > Thanks for proposing this improvement! @Xintong
> > The proposal looks good to me. Agreed that we should make it as simple as
> > possible for users to understand.
> >
> > Thanks,
> > Zhu
> >
> > Dian Fu <[email protected]> 于2020年9月3日周四 下午2:11写道：
> >
> > > Thanks for driving this FLIP, Xintong! +1 to the updated version.
> > >
> > > > 在 2020年9月2日，下午6:09，Xintong Song <[email protected]> 写道：
> > > >
> > > > Thanks for the input, Yu.
> > > >
> > > > I believe the current proposal should work with RocksDB, or any other
> > > state
> > > > backend, using memory at either the slot or the scope. With the
> > proposed
> > > > approach, all we need is an indicator (e.g., a configuration option)
> > > > telling us which scope should we calculate the fractions for.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <[email protected]> wrote:
> > > >
> > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
> > > >>
> > > >> Just one supplement for the RocksDB state backend part:
> > > >>
> > > >> It's true that currently we're using managed memory at the slot
> scope.
> > > >> However, IMHO, we may support setting weights for different stateful
> > > >> operators (for advanced usage) in future. For example, users may
> > choose
> > > to
> > > >> set higher weights for join operator over aggregation operator, to
> > give
> > > >> more memory to those with bigger states. In this case, we may also
> use
> > > >> managed memory at the operator scope for state backends. And if I
> > > >> understand correctly, the current design could cover this case well.
> > > >>
> > > >> Best Regards,
> > > >> Yu
> > > >>
> > > >>
> > > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <[email protected]>
> > > wrote:
> > > >>
> > > >>> Thanks all for the feedback and discussion.
> > > >>>
> > > >>> I have updated the FLIP, with the following changes.
> > > >>>
> > > >>>   - Choose the main proposal over the alternative approach
> > > >>>   - Combine weights of RocksDB and batch operators
> > > >>>   - Expose weights through configuration options, rather than via
> > > >>>   ExecutionConfig.
> > > >>>   - Add implementation plan.
> > > >>>
> > > >>> Please help take another look.
> > > >>>
> > > >>> Thank you~
> > > >>>
> > > >>> Xintong Song
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <[email protected]
> >
> > > >> wrote:
> > > >>>
> > > >>>> Thanks for the inputs, Aljoscha & Till.
> > > >>>>
> > > >>>>
> > > >>>> # Weight Configuration
> > > >>>>
> > > >>>>
> > > >>>> I think exposing the knobs incrementally is a good idea. However,
> > I'm
> > > >> not
> > > >>>> sure about non-configurable as the first step.
> > > >>>>
> > > >>>>
> > > >>>> Currently, users can tune memory for rocksdb
> > > >>>> ('taskmanager.memory.managed.size') and python
> > > >>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
> > > >> which
> > > >>>> practically means any combination of rocksdb and python memory
> > sizes.
> > > >> If
> > > >>> we
> > > >>>> switch to non-configurable weights, that will be a regression
> > compared
> > > >> to
> > > >>>> 1.11.
> > > >>>>
> > > >>>>
> > > >>>> Therefore, I think exposing via configuration options might be a
> > good
> > > >>>> first step. And we can discuss exposing via ExecutionConfig if
> later
> > > we
> > > >>> see
> > > >>>> that requirement.
> > > >>>>
> > > >>>>
> > > >>>> # Naming of Weights
> > > >>>>
> > > >>>>
> > > >>>> I'm neutral for "Flink/Internal memory".
> > > >>>>
> > > >>>>
> > > >>>> I think the reason we can combine weights for batch algorithms and
> > > >> state
> > > >>>> backends is that they are never mixed together. My only concern
> > > >>>> for "Flink/Internal memory", which might not be a problem at the
> > > >> moment,
> > > >>> is
> > > >>>> that what if new memory use cases appear in the future, which can
> > also
> > > >> be
> > > >>>> described by "Flink/Internal memory" but is not guaranteed not
> mixed
> > > >> with
> > > >>>> batch algorithms or state backends?
> > > >>>>
> > > >>>>
> > > >>>> Anyway, I think the naming should not block this FLIP, as long as
> we
> > > >> have
> > > >>>> consensus on combining the two weights for rocksdb and batch
> > > >> algorithms.
> > > >>> We
> > > >>>> can keep the naming discussion open until the implementation
> phase.
> > > >>>>
> > > >>>>
> > > >>>> Thank you~
> > > >>>>
> > > >>>> Xintong Song
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <
> [email protected]
> > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Thanks for creating this FLIP Xintong.
> > > >>>>>
> > > >>>>> I agree with the previous comments that the memory configuration
> > > >> should
> > > >>> be
> > > >>>>> as easy as possible. Every new knob has the potential to confuse
> > > users
> > > >>>>> and/or allows him to shoot himself in the foot. Consequently, I
> am
> > +1
> > > >>> for
> > > >>>>> the first proposal in the FLIP since it is simpler.
> > > >>>>>
> > > >>>>> Also +1 for Stephan's proposal to combine batch operator's and
> > > >>>>> RocksDB's memory usage into one weight.
> > > >>>>>
> > > >>>>> Concerning the names for the two weights, I fear that we are
> facing
> > > >> one
> > > >>> of
> > > >>>>> the two hard things in computer science. To add another idea, we
> > > could
> > > >>>>> name
> > > >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> > > >>>>>
> > > >>>>> For the sake of making the scope of the FLIP as small as possible
> > and
> > > >> to
> > > >>>>> develop the feature incrementally, I think that Aljoscha's
> proposal
> > > to
> > > >>>>> make
> > > >>>>> it non-configurable for the first step sounds like a good idea.
> As
> > a
> > > >>> next
> > > >>>>> step (and also if we see need), we can make the memory weights
> > > >>>>> configurable
> > > >>>>> via the configuration. And last, we could expose it via the
> > > >>>>> ExecutionConfig
> > > >>>>> if it is required.
> > > >>>>>
> > > >>>>> Cheers,
> > > >>>>> Till
> > > >>>>>
> > > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <
> > [email protected]
> > > >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi,
> > > >>>>>>
> > > >>>>>> playing devils advocate here: should we even make the memory
> > weights
> > > >>>>>> configurable? We could go with weights that should make sense
> for
> > > >> most
> > > >>>>>> cases in the first version and only introduce configurable
> weights
> > > >>> when
> > > >>>>>> (if) users need them.
> > > >>>>>>
> > > >>>>>> Regarding where/how things are configured, I think that most
> > things
> > > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
> > > >> makes
> > > >>>>>> them configurable via flink-conf.yaml and via command line
> > > >> parameters,
> > > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
> > > >> about
> > > >>>>>> offering programmatic API for cases where it makes sense, of
> > course.
> > > >>>>>>
> > > >>>>>> Regarding naming one of the configurable weights
> > > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to
> be
> > > >> that
> > > >>>>>> specific because the option will not age well. For example when
> we
> > > >>> want
> > > >>>>>> to change which group of memory consumers are configured
> together
> > or
> > > >>>>>> when we add something new.
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Aljoscha
> > > >>>>>>
> > > >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> > > >>>>>>> Thanks for the feedbacks, @Stephan
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>   - There is a push to make as much as possible configurable
> via
> > > >>> the
> > > >>>>>> main
> > > >>>>>>>> configuration, and not only in code. Specifically values for
> > > >>>>> operations
> > > >>>>>> and
> > > >>>>>>>> tuning.
> > > >>>>>>>>     I think it would be more important to have such memory
> > > >> weights
> > > >>>>> in
> > > >>>>>> the
> > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> I can see the benefit that having memory weights in the main
> > > >>>>>> configuration
> > > >>>>>>> makes tuning easier, which makes great sense to me. On the
> other
> > > >>> hand,
> > > >>>>>> what
> > > >>>>>>> we lose is the flexibility to have different weights for jobs
> > > >>> running
> > > >>>>> in
> > > >>>>>>> the same Flink cluster. It seems to me the problem is that we
> > > >> don't
> > > >>>>> have
> > > >>>>>> an
> > > >>>>>>> easy way to overwrite job-specific configurations without
> > touching
> > > >>> the
> > > >>>>>>> codes.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Given the current status, what if we make the memory weights
> > > >>>>> configurable
> > > >>>>>>> through both the main configuration and the programming API?
> The
> > > >>> main
> > > >>>>>>> configuration should take effect iff the weights are not
> > > >> explicitly
> > > >>>>>>> specified through the programming API. In this way, job cluster
> > > >>> users
> > > >>>>> can
> > > >>>>>>> easily tune the weight through the main configuration, while
> > > >> session
> > > >>>>>>> cluster users, if they want to have different weights for jobs,
> > > >> can
> > > >>>>> still
> > > >>>>>>> overwrite the weight through execution configs.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>   - My recommendation would be to keep this as simple as
> > > >> possible.
> > > >>>>> This
> > > >>>>>>>> will make a lot of configuration code harder, and make it
> harder
> > > >>> for
> > > >>>>>> users
> > > >>>>>>>> to understand Flink's memory model.
> > > >>>>>>>>     Making things as easy for users to understand is very
> > > >>> important
> > > >>>>> in
> > > >>>>>> my
> > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > > >> better
> > > >>>>> than
> > > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > > >>>>>>>
> > > >>>>>>> +1 from my side.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>   - For the simplicity, we could go even further and simply
> have
> > > >>> two
> > > >>>>>> memory
> > > >>>>>>>> users at the moment: The operator algorithm/data-structure and
> > > >> the
> > > >>>>>> external
> > > >>>>>>>> language process (Python for now).
> > > >>>>>>>>     We never have batch algos and RocksDB mixed, having this
> as
> > > >>>>>> separate
> > > >>>>>>>> options is confusing as it suggests this can be combined
> > > >>>>> arbitrarily. I
> > > >>>>>>>> also think that a slim possibility that we may ever combine
> this
> > > >> in
> > > >>>>> the
> > > >>>>>>>> future is not enough reason to make it more complex/confusing.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
> > > >>> never
> > > >>>>>> mixed
> > > >>>>>>> together. We can even just name it
> "StateBackend-BatchAlgorithm"
> > > >> for
> > > >>>>>>> explicitly.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> For "external language process", I'm not entirely sure. Future
> > > >>>>> external
> > > >>>>>>> languages are possibly mixed with python processes. To avoid
> > later
> > > >>>>>>> considering how to share external language memory across
> > different
> > > >>>>>>> languages, I would suggest to present the concept as "python
> > > >> memory"
> > > >>>>>> rather
> > > >>>>>>> than "external language process memory".
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Thank you~
> > > >>>>>>>
> > > >>>>>>> Xintong Song
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <
> [email protected]>
> > > >>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks for driving this proposal. A few thoughts on the
> current
> > > >>>>> design:
> > > >>>>>>>>
> > > >>>>>>>>   - There is a push to make as much as possible configurable
> via
> > > >>> the
> > > >>>>>> main
> > > >>>>>>>> configuration, and not only in code. Specifically values for
> > > >>>>> operations
> > > >>>>>> and
> > > >>>>>>>> tuning.
> > > >>>>>>>>     I think it would be more important to have such memory
> > > >> weights
> > > >>>>> in
> > > >>>>>> the
> > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > >>>>>>>>
> > > >>>>>>>>   - My recommendation would be to keep this as simple as
> > > >> possible.
> > > >>>>> This
> > > >>>>>>>> will make a lot of configuration code harder, and make it
> harder
> > > >>> for
> > > >>>>>> users
> > > >>>>>>>> to understand Flink's memory model.
> > > >>>>>>>>     Making things as easy for users to understand is very
> > > >>> important
> > > >>>>> in
> > > >>>>>> my
> > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > > >> better
> > > >>>>> than
> > > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > > >>>>>>>>
> > > >>>>>>>>   - For the simplicity, we could go even further and simply
> have
> > > >>> two
> > > >>>>>> memory
> > > >>>>>>>> users at the moment: The operator algorithm/data-structure and
> > > >> the
> > > >>>>>> external
> > > >>>>>>>> language process (Python for now).
> > > >>>>>>>>     We never have batch algos and RocksDB mixed, having this
> as
> > > >>>>>> separate
> > > >>>>>>>> options is confusing as it suggests this can be combined
> > > >>>>> arbitrarily. I
> > > >>>>>>>> also think that a slim possibility that we may ever combine
> this
> > > >> in
> > > >>>>> the
> > > >>>>>>>> future is not enough reason to make it more complex/confusing.
> > > >>>>>>>>
> > > >>>>>>>>   - I am also not aware of any plans to combine the network
> and
> > > >>>>>> operator
> > > >>>>>>>> memory. Not that it would be infeasible to do this, but I
> think
> > > >>> this
> > > >>>>>> would
> > > >>>>>>>> also be orthogonal to this change, and I am not sure this
> would
> > > >> be
> > > >>>>>> solved
> > > >>>>>>>> with static weights. So trying to get network memory into this
> > > >>>>> proposal
> > > >>>>>>>> seems pre-mature to me.
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Stephan
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> > > >>> [email protected]
> > > >>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> A quick question, does network memory treated as managed
> > memory
> > > >>>>> now?
> > > >>>>>> Or
> > > >>>>>>>>> in
> > > >>>>>>>>>> the future?
> > > >>>>>>>>>>
> > > >>>>>>>>> No, network memory is independent from managed memory ATM.
> And
> > > >> I'm
> > > >>>>> not
> > > >>>>>>>>> aware of any plan to combine these two.
> > > >>>>>>>>>
> > > >>>>>>>>> Any insights there?
> > > >>>>>>>>>
> > > >>>>>>>>> Thank you~
> > > >>>>>>>>>
> > > >>>>>>>>> Xintong Song
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <[email protected]
> >
> > > >>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> A quick question, does network memory treated as managed
> > memory
> > > >>>>> now?
> > > >>>>>> Or
> > > >>>>>>>>> in
> > > >>>>>>>>>> the future?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Kurt
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> > > >>>>> [email protected]>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi devs,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
> > > >>> proposes
> > > >>>>> how
> > > >>>>>>>>>>> managed memory should be shared by various use cases
> within a
> > > >>>>> slot.
> > > >>>>>>>>> This
> > > >>>>>>>>>> is
> > > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> > > >> state
> > > >>>>>>>> backend
> > > >>>>>>>>>> and
> > > >>>>>>>>>>> batch operators are the only use cases of managed memory
> for
> > > >>>>>>>> streaming
> > > >>>>>>>>>> and
> > > >>>>>>>>>>> batch jobs respectively, which is no longer true with the
> > > >>>>>>>> introduction
> > > >>>>>>>>> of
> > > >>>>>>>>>>> Python UDFs.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Please notice that we have not reached consensus between
> two
> > > >>>>>>>> different
> > > >>>>>>>>>>> designs. The major part of this FLIP describes one of the
> > > >>>>> candidates,
> > > >>>>>>>>>> while
> > > >>>>>>>>>>> the alternative is discussed in the section "Rejected
> > > >>>>> Alternatives".
> > > >>>>>>>> We
> > > >>>>>>>>>> are
> > > >>>>>>>>>>> hoping to borrow intelligence from the community to help us
> > > >>>>> resolve
> > > >>>>>>>> the
> > > >>>>>>>>>>> disagreement.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Any feedback would be appreciated.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thank you~
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Xintong Song
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [1]
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [2]
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Reply via email to