Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Xintong Song Thu, 03 Sep 2020 18:47:23 -0700

Thanks Till, `taskmanager.memory.managed.consumer-weights` sounds good to
me.


Thank you~

Xintong Song



On Thu, Sep 3, 2020 at 8:44 PM Till Rohrmann <[email protected]> wrote:

> Thanks for updating the FLIP Xintong. It looks good to me. One minor
> comment is that we could name the configuration parameter
> also taskmanager.memory.managed.consumer-weights which might be a bit more
> expressive what this option does.
>
> Cheers,
> Till
>
> On Thu, Sep 3, 2020 at 12:44 PM Xintong Song <[email protected]>
> wrote:
>
> > Thanks all for the feedback.
> >
> > FYI, I've opened a voting thread[1] on this.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html
> >
> >
> > On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <[email protected]> wrote:
> >
> > > Thanks for proposing this improvement! @Xintong
> > > The proposal looks good to me. Agreed that we should make it as simple
> as
> > > possible for users to understand.
> > >
> > > Thanks,
> > > Zhu
> > >
> > > Dian Fu <[email protected]> 于2020年9月3日周四 下午2:11写道：
> > >
> > > > Thanks for driving this FLIP, Xintong! +1 to the updated version.
> > > >
> > > > > 在 2020年9月2日，下午6:09，Xintong Song <[email protected]> 写道：
> > > > >
> > > > > Thanks for the input, Yu.
> > > > >
> > > > > I believe the current proposal should work with RocksDB, or any
> other
> > > > state
> > > > > backend, using memory at either the slot or the scope. With the
> > > proposed
> > > > > approach, all we need is an indicator (e.g., a configuration
> option)
> > > > > telling us which scope should we calculate the fractions for.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <[email protected]> wrote:
> > > > >
> > > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
> > > > >>
> > > > >> Just one supplement for the RocksDB state backend part:
> > > > >>
> > > > >> It's true that currently we're using managed memory at the slot
> > scope.
> > > > >> However, IMHO, we may support setting weights for different
> stateful
> > > > >> operators (for advanced usage) in future. For example, users may
> > > choose
> > > > to
> > > > >> set higher weights for join operator over aggregation operator, to
> > > give
> > > > >> more memory to those with bigger states. In this case, we may also
> > use
> > > > >> managed memory at the operator scope for state backends. And if I
> > > > >> understand correctly, the current design could cover this case
> well.
> > > > >>
> > > > >> Best Regards,
> > > > >> Yu
> > > > >>
> > > > >>
> > > > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <[email protected]>
> > > > wrote:
> > > > >>
> > > > >>> Thanks all for the feedback and discussion.
> > > > >>>
> > > > >>> I have updated the FLIP, with the following changes.
> > > > >>>
> > > > >>>   - Choose the main proposal over the alternative approach
> > > > >>>   - Combine weights of RocksDB and batch operators
> > > > >>>   - Expose weights through configuration options, rather than via
> > > > >>>   ExecutionConfig.
> > > > >>>   - Add implementation plan.
> > > > >>>
> > > > >>> Please help take another look.
> > > > >>>
> > > > >>> Thank you~
> > > > >>>
> > > > >>> Xintong Song
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <
> [email protected]
> > >
> > > > >> wrote:
> > > > >>>
> > > > >>>> Thanks for the inputs, Aljoscha & Till.
> > > > >>>>
> > > > >>>>
> > > > >>>> # Weight Configuration
> > > > >>>>
> > > > >>>>
> > > > >>>> I think exposing the knobs incrementally is a good idea.
> However,
> > > I'm
> > > > >> not
> > > > >>>> sure about non-configurable as the first step.
> > > > >>>>
> > > > >>>>
> > > > >>>> Currently, users can tune memory for rocksdb
> > > > >>>> ('taskmanager.memory.managed.size') and python
> > > > >>>> ('python.fn-execution.[framework|buffer].memory.size')
> separately,
> > > > >> which
> > > > >>>> practically means any combination of rocksdb and python memory
> > > sizes.
> > > > >> If
> > > > >>> we
> > > > >>>> switch to non-configurable weights, that will be a regression
> > > compared
> > > > >> to
> > > > >>>> 1.11.
> > > > >>>>
> > > > >>>>
> > > > >>>> Therefore, I think exposing via configuration options might be a
> > > good
> > > > >>>> first step. And we can discuss exposing via ExecutionConfig if
> > later
> > > > we
> > > > >>> see
> > > > >>>> that requirement.
> > > > >>>>
> > > > >>>>
> > > > >>>> # Naming of Weights
> > > > >>>>
> > > > >>>>
> > > > >>>> I'm neutral for "Flink/Internal memory".
> > > > >>>>
> > > > >>>>
> > > > >>>> I think the reason we can combine weights for batch algorithms
> and
> > > > >> state
> > > > >>>> backends is that they are never mixed together. My only concern
> > > > >>>> for "Flink/Internal memory", which might not be a problem at the
> > > > >> moment,
> > > > >>> is
> > > > >>>> that what if new memory use cases appear in the future, which
> can
> > > also
> > > > >> be
> > > > >>>> described by "Flink/Internal memory" but is not guaranteed not
> > mixed
> > > > >> with
> > > > >>>> batch algorithms or state backends?
> > > > >>>>
> > > > >>>>
> > > > >>>> Anyway, I think the naming should not block this FLIP, as long
> as
> > we
> > > > >> have
> > > > >>>> consensus on combining the two weights for rocksdb and batch
> > > > >> algorithms.
> > > > >>> We
> > > > >>>> can keep the naming discussion open until the implementation
> > phase.
> > > > >>>>
> > > > >>>>
> > > > >>>> Thank you~
> > > > >>>>
> > > > >>>> Xintong Song
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <
> > [email protected]
> > > >
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Thanks for creating this FLIP Xintong.
> > > > >>>>>
> > > > >>>>> I agree with the previous comments that the memory
> configuration
> > > > >> should
> > > > >>> be
> > > > >>>>> as easy as possible. Every new knob has the potential to
> confuse
> > > > users
> > > > >>>>> and/or allows him to shoot himself in the foot. Consequently, I
> > am
> > > +1
> > > > >>> for
> > > > >>>>> the first proposal in the FLIP since it is simpler.
> > > > >>>>>
> > > > >>>>> Also +1 for Stephan's proposal to combine batch operator's and
> > > > >>>>> RocksDB's memory usage into one weight.
> > > > >>>>>
> > > > >>>>> Concerning the names for the two weights, I fear that we are
> > facing
> > > > >> one
> > > > >>> of
> > > > >>>>> the two hard things in computer science. To add another idea,
> we
> > > > could
> > > > >>>>> name
> > > > >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> > > > >>>>>
> > > > >>>>> For the sake of making the scope of the FLIP as small as
> possible
> > > and
> > > > >> to
> > > > >>>>> develop the feature incrementally, I think that Aljoscha's
> > proposal
> > > > to
> > > > >>>>> make
> > > > >>>>> it non-configurable for the first step sounds like a good idea.
> > As
> > > a
> > > > >>> next
> > > > >>>>> step (and also if we see need), we can make the memory weights
> > > > >>>>> configurable
> > > > >>>>> via the configuration. And last, we could expose it via the
> > > > >>>>> ExecutionConfig
> > > > >>>>> if it is required.
> > > > >>>>>
> > > > >>>>> Cheers,
> > > > >>>>> Till
> > > > >>>>>
> > > > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <
> > > [email protected]
> > > > >
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Hi,
> > > > >>>>>>
> > > > >>>>>> playing devils advocate here: should we even make the memory
> > > weights
> > > > >>>>>> configurable? We could go with weights that should make sense
> > for
> > > > >> most
> > > > >>>>>> cases in the first version and only introduce configurable
> > weights
> > > > >>> when
> > > > >>>>>> (if) users need them.
> > > > >>>>>>
> > > > >>>>>> Regarding where/how things are configured, I think that most
> > > things
> > > > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!).
> This
> > > > >> makes
> > > > >>>>>> them configurable via flink-conf.yaml and via command line
> > > > >> parameters,
> > > > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can
> think
> > > > >> about
> > > > >>>>>> offering programmatic API for cases where it makes sense, of
> > > course.
> > > > >>>>>>
> > > > >>>>>> Regarding naming one of the configurable weights
> > > > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to
> > be
> > > > >> that
> > > > >>>>>> specific because the option will not age well. For example
> when
> > we
> > > > >>> want
> > > > >>>>>> to change which group of memory consumers are configured
> > together
> > > or
> > > > >>>>>> when we add something new.
> > > > >>>>>>
> > > > >>>>>> Best,
> > > > >>>>>> Aljoscha
> > > > >>>>>>
> > > > >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> > > > >>>>>>> Thanks for the feedbacks, @Stephan
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>   - There is a push to make as much as possible configurable
> > via
> > > > >>> the
> > > > >>>>>> main
> > > > >>>>>>>> configuration, and not only in code. Specifically values for
> > > > >>>>> operations
> > > > >>>>>> and
> > > > >>>>>>>> tuning.
> > > > >>>>>>>>     I think it would be more important to have such memory
> > > > >> weights
> > > > >>>>> in
> > > > >>>>>> the
> > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> I can see the benefit that having memory weights in the main
> > > > >>>>>> configuration
> > > > >>>>>>> makes tuning easier, which makes great sense to me. On the
> > other
> > > > >>> hand,
> > > > >>>>>> what
> > > > >>>>>>> we lose is the flexibility to have different weights for jobs
> > > > >>> running
> > > > >>>>> in
> > > > >>>>>>> the same Flink cluster. It seems to me the problem is that we
> > > > >> don't
> > > > >>>>> have
> > > > >>>>>> an
> > > > >>>>>>> easy way to overwrite job-specific configurations without
> > > touching
> > > > >>> the
> > > > >>>>>>> codes.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Given the current status, what if we make the memory weights
> > > > >>>>> configurable
> > > > >>>>>>> through both the main configuration and the programming API?
> > The
> > > > >>> main
> > > > >>>>>>> configuration should take effect iff the weights are not
> > > > >> explicitly
> > > > >>>>>>> specified through the programming API. In this way, job
> cluster
> > > > >>> users
> > > > >>>>> can
> > > > >>>>>>> easily tune the weight through the main configuration, while
> > > > >> session
> > > > >>>>>>> cluster users, if they want to have different weights for
> jobs,
> > > > >> can
> > > > >>>>> still
> > > > >>>>>>> overwrite the weight through execution configs.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>   - My recommendation would be to keep this as simple as
> > > > >> possible.
> > > > >>>>> This
> > > > >>>>>>>> will make a lot of configuration code harder, and make it
> > harder
> > > > >>> for
> > > > >>>>>> users
> > > > >>>>>>>> to understand Flink's memory model.
> > > > >>>>>>>>     Making things as easy for users to understand is very
> > > > >>> important
> > > > >>>>> in
> > > > >>>>>> my
> > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > > > >> better
> > > > >>>>> than
> > > > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > > > >>>>>>>
> > > > >>>>>>> +1 from my side.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>   - For the simplicity, we could go even further and simply
> > have
> > > > >>> two
> > > > >>>>>> memory
> > > > >>>>>>>> users at the moment: The operator algorithm/data-structure
> and
> > > > >> the
> > > > >>>>>> external
> > > > >>>>>>>> language process (Python for now).
> > > > >>>>>>>>     We never have batch algos and RocksDB mixed, having this
> > as
> > > > >>>>>> separate
> > > > >>>>>>>> options is confusing as it suggests this can be combined
> > > > >>>>> arbitrarily. I
> > > > >>>>>>>> also think that a slim possibility that we may ever combine
> > this
> > > > >> in
> > > > >>>>> the
> > > > >>>>>>>> future is not enough reason to make it more
> complex/confusing.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for
> they're
> > > > >>> never
> > > > >>>>>> mixed
> > > > >>>>>>> together. We can even just name it
> > "StateBackend-BatchAlgorithm"
> > > > >> for
> > > > >>>>>>> explicitly.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> For "external language process", I'm not entirely sure.
> Future
> > > > >>>>> external
> > > > >>>>>>> languages are possibly mixed with python processes. To avoid
> > > later
> > > > >>>>>>> considering how to share external language memory across
> > > different
> > > > >>>>>>> languages, I would suggest to present the concept as "python
> > > > >> memory"
> > > > >>>>>> rather
> > > > >>>>>>> than "external language process memory".
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Thank you~
> > > > >>>>>>>
> > > > >>>>>>> Xintong Song
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <
> > [email protected]>
> > > > >>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Thanks for driving this proposal. A few thoughts on the
> > current
> > > > >>>>> design:
> > > > >>>>>>>>
> > > > >>>>>>>>   - There is a push to make as much as possible configurable
> > via
> > > > >>> the
> > > > >>>>>> main
> > > > >>>>>>>> configuration, and not only in code. Specifically values for
> > > > >>>>> operations
> > > > >>>>>> and
> > > > >>>>>>>> tuning.
> > > > >>>>>>>>     I think it would be more important to have such memory
> > > > >> weights
> > > > >>>>> in
> > > > >>>>>> the
> > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > > >>>>>>>>
> > > > >>>>>>>>   - My recommendation would be to keep this as simple as
> > > > >> possible.
> > > > >>>>> This
> > > > >>>>>>>> will make a lot of configuration code harder, and make it
> > harder
> > > > >>> for
> > > > >>>>>> users
> > > > >>>>>>>> to understand Flink's memory model.
> > > > >>>>>>>>     Making things as easy for users to understand is very
> > > > >>> important
> > > > >>>>> in
> > > > >>>>>> my
> > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > > > >> better
> > > > >>>>> than
> > > > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > > > >>>>>>>>
> > > > >>>>>>>>   - For the simplicity, we could go even further and simply
> > have
> > > > >>> two
> > > > >>>>>> memory
> > > > >>>>>>>> users at the moment: The operator algorithm/data-structure
> and
> > > > >> the
> > > > >>>>>> external
> > > > >>>>>>>> language process (Python for now).
> > > > >>>>>>>>     We never have batch algos and RocksDB mixed, having this
> > as
> > > > >>>>>> separate
> > > > >>>>>>>> options is confusing as it suggests this can be combined
> > > > >>>>> arbitrarily. I
> > > > >>>>>>>> also think that a slim possibility that we may ever combine
> > this
> > > > >> in
> > > > >>>>> the
> > > > >>>>>>>> future is not enough reason to make it more
> complex/confusing.
> > > > >>>>>>>>
> > > > >>>>>>>>   - I am also not aware of any plans to combine the network
> > and
> > > > >>>>>> operator
> > > > >>>>>>>> memory. Not that it would be infeasible to do this, but I
> > think
> > > > >>> this
> > > > >>>>>> would
> > > > >>>>>>>> also be orthogonal to this change, and I am not sure this
> > would
> > > > >> be
> > > > >>>>>> solved
> > > > >>>>>>>> with static weights. So trying to get network memory into
> this
> > > > >>>>> proposal
> > > > >>>>>>>> seems pre-mature to me.
> > > > >>>>>>>>
> > > > >>>>>>>> Best,
> > > > >>>>>>>> Stephan
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> > > > >>> [email protected]
> > > > >>>>>>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> A quick question, does network memory treated as managed
> > > memory
> > > > >>>>> now?
> > > > >>>>>> Or
> > > > >>>>>>>>> in
> > > > >>>>>>>>>> the future?
> > > > >>>>>>>>>>
> > > > >>>>>>>>> No, network memory is independent from managed memory ATM.
> > And
> > > > >> I'm
> > > > >>>>> not
> > > > >>>>>>>>> aware of any plan to combine these two.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Any insights there?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thank you~
> > > > >>>>>>>>>
> > > > >>>>>>>>> Xintong Song
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <
> [email protected]
> > >
> > > > >>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> A quick question, does network memory treated as managed
> > > memory
> > > > >>>>> now?
> > > > >>>>>> Or
> > > > >>>>>>>>> in
> > > > >>>>>>>>>> the future?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Best,
> > > > >>>>>>>>>> Kurt
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> > > > >>>>> [email protected]>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi devs,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
> > > > >>> proposes
> > > > >>>>> how
> > > > >>>>>>>>>>> managed memory should be shared by various use cases
> > within a
> > > > >>>>> slot.
> > > > >>>>>>>>> This
> > > > >>>>>>>>>> is
> > > > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> > > > >> state
> > > > >>>>>>>> backend
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>> batch operators are the only use cases of managed memory
> > for
> > > > >>>>>>>> streaming
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>> batch jobs respectively, which is no longer true with the
> > > > >>>>>>>> introduction
> > > > >>>>>>>>> of
> > > > >>>>>>>>>>> Python UDFs.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Please notice that we have not reached consensus between
> > two
> > > > >>>>>>>> different
> > > > >>>>>>>>>>> designs. The major part of this FLIP describes one of the
> > > > >>>>> candidates,
> > > > >>>>>>>>>> while
> > > > >>>>>>>>>>> the alternative is discussed in the section "Rejected
> > > > >>>>> Alternatives".
> > > > >>>>>>>> We
> > > > >>>>>>>>>> are
> > > > >>>>>>>>>>> hoping to borrow intelligence from the community to help
> us
> > > > >>>>> resolve
> > > > >>>>>>>> the
> > > > >>>>>>>>>>> disagreement.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Any feedback would be appreciated.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Thank you~
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Xintong Song
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> [1]
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> [2]
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Reply via email to