Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Zhu Zhu Wed, 02 Sep 2020 23:55:39 -0700

Thanks for proposing this improvement! @Xintong
The proposal looks good to me. Agreed that we should make it as simple as
possible for users to understand.


Thanks,
Zhu

Dian Fu <dian0511...@gmail.com> 于2020年9月3日周四 下午2:11写道：

> Thanks for driving this FLIP, Xintong! +1 to the updated version.
>
> > 在 2020年9月2日，下午6:09，Xintong Song <tonysong...@gmail.com> 写道：
> >
> > Thanks for the input, Yu.
> >
> > I believe the current proposal should work with RocksDB, or any other
> state
> > backend, using memory at either the slot or the scope. With the proposed
> > approach, all we need is an indicator (e.g., a configuration option)
> > telling us which scope should we calculate the fractions for.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <car...@gmail.com> wrote:
> >
> >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
> >>
> >> Just one supplement for the RocksDB state backend part:
> >>
> >> It's true that currently we're using managed memory at the slot scope.
> >> However, IMHO, we may support setting weights for different stateful
> >> operators (for advanced usage) in future. For example, users may choose
> to
> >> set higher weights for join operator over aggregation operator, to give
> >> more memory to those with bigger states. In this case, we may also use
> >> managed memory at the operator scope for state backends. And if I
> >> understand correctly, the current design could cover this case well.
> >>
> >> Best Regards,
> >> Yu
> >>
> >>
> >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <tonysong...@gmail.com>
> wrote:
> >>
> >>> Thanks all for the feedback and discussion.
> >>>
> >>> I have updated the FLIP, with the following changes.
> >>>
> >>>   - Choose the main proposal over the alternative approach
> >>>   - Combine weights of RocksDB and batch operators
> >>>   - Expose weights through configuration options, rather than via
> >>>   ExecutionConfig.
> >>>   - Add implementation plan.
> >>>
> >>> Please help take another look.
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <tonysong...@gmail.com>
> >> wrote:
> >>>
> >>>> Thanks for the inputs, Aljoscha & Till.
> >>>>
> >>>>
> >>>> # Weight Configuration
> >>>>
> >>>>
> >>>> I think exposing the knobs incrementally is a good idea. However, I'm
> >> not
> >>>> sure about non-configurable as the first step.
> >>>>
> >>>>
> >>>> Currently, users can tune memory for rocksdb
> >>>> ('taskmanager.memory.managed.size') and python
> >>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
> >> which
> >>>> practically means any combination of rocksdb and python memory sizes.
> >> If
> >>> we
> >>>> switch to non-configurable weights, that will be a regression compared
> >> to
> >>>> 1.11.
> >>>>
> >>>>
> >>>> Therefore, I think exposing via configuration options might be a good
> >>>> first step. And we can discuss exposing via ExecutionConfig if later
> we
> >>> see
> >>>> that requirement.
> >>>>
> >>>>
> >>>> # Naming of Weights
> >>>>
> >>>>
> >>>> I'm neutral for "Flink/Internal memory".
> >>>>
> >>>>
> >>>> I think the reason we can combine weights for batch algorithms and
> >> state
> >>>> backends is that they are never mixed together. My only concern
> >>>> for "Flink/Internal memory", which might not be a problem at the
> >> moment,
> >>> is
> >>>> that what if new memory use cases appear in the future, which can also
> >> be
> >>>> described by "Flink/Internal memory" but is not guaranteed not mixed
> >> with
> >>>> batch algorithms or state backends?
> >>>>
> >>>>
> >>>> Anyway, I think the naming should not block this FLIP, as long as we
> >> have
> >>>> consensus on combining the two weights for rocksdb and batch
> >> algorithms.
> >>> We
> >>>> can keep the naming discussion open until the implementation phase.
> >>>>
> >>>>
> >>>> Thank you~
> >>>>
> >>>> Xintong Song
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <trohrm...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Thanks for creating this FLIP Xintong.
> >>>>>
> >>>>> I agree with the previous comments that the memory configuration
> >> should
> >>> be
> >>>>> as easy as possible. Every new knob has the potential to confuse
> users
> >>>>> and/or allows him to shoot himself in the foot. Consequently, I am +1
> >>> for
> >>>>> the first proposal in the FLIP since it is simpler.
> >>>>>
> >>>>> Also +1 for Stephan's proposal to combine batch operator's and
> >>>>> RocksDB's memory usage into one weight.
> >>>>>
> >>>>> Concerning the names for the two weights, I fear that we are facing
> >> one
> >>> of
> >>>>> the two hard things in computer science. To add another idea, we
> could
> >>>>> name
> >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> >>>>>
> >>>>> For the sake of making the scope of the FLIP as small as possible and
> >> to
> >>>>> develop the feature incrementally, I think that Aljoscha's proposal
> to
> >>>>> make
> >>>>> it non-configurable for the first step sounds like a good idea. As a
> >>> next
> >>>>> step (and also if we see need), we can make the memory weights
> >>>>> configurable
> >>>>> via the configuration. And last, we could expose it via the
> >>>>> ExecutionConfig
> >>>>> if it is required.
> >>>>>
> >>>>> Cheers,
> >>>>> Till
> >>>>>
> >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <aljos...@apache.org
> >
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> playing devils advocate here: should we even make the memory weights
> >>>>>> configurable? We could go with weights that should make sense for
> >> most
> >>>>>> cases in the first version and only introduce configurable weights
> >>> when
> >>>>>> (if) users need them.
> >>>>>>
> >>>>>> Regarding where/how things are configured, I think that most things
> >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
> >> makes
> >>>>>> them configurable via flink-conf.yaml and via command line
> >> parameters,
> >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
> >> about
> >>>>>> offering programmatic API for cases where it makes sense, of course.
> >>>>>>
> >>>>>> Regarding naming one of the configurable weights
> >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be
> >> that
> >>>>>> specific because the option will not age well. For example when we
> >>> want
> >>>>>> to change which group of memory consumers are configured together or
> >>>>>> when we add something new.
> >>>>>>
> >>>>>> Best,
> >>>>>> Aljoscha
> >>>>>>
> >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> >>>>>>> Thanks for the feedbacks, @Stephan
> >>>>>>>
> >>>>>>>
> >>>>>>>   - There is a push to make as much as possible configurable via
> >>> the
> >>>>>> main
> >>>>>>>> configuration, and not only in code. Specifically values for
> >>>>> operations
> >>>>>> and
> >>>>>>>> tuning.
> >>>>>>>>     I think it would be more important to have such memory
> >> weights
> >>>>> in
> >>>>>> the
> >>>>>>>> config, compared to in the program API. /cc Aljoscha
> >>>>>>>
> >>>>>>>
> >>>>>>> I can see the benefit that having memory weights in the main
> >>>>>> configuration
> >>>>>>> makes tuning easier, which makes great sense to me. On the other
> >>> hand,
> >>>>>> what
> >>>>>>> we lose is the flexibility to have different weights for jobs
> >>> running
> >>>>> in
> >>>>>>> the same Flink cluster. It seems to me the problem is that we
> >> don't
> >>>>> have
> >>>>>> an
> >>>>>>> easy way to overwrite job-specific configurations without touching
> >>> the
> >>>>>>> codes.
> >>>>>>>
> >>>>>>>
> >>>>>>> Given the current status, what if we make the memory weights
> >>>>> configurable
> >>>>>>> through both the main configuration and the programming API? The
> >>> main
> >>>>>>> configuration should take effect iff the weights are not
> >> explicitly
> >>>>>>> specified through the programming API. In this way, job cluster
> >>> users
> >>>>> can
> >>>>>>> easily tune the weight through the main configuration, while
> >> session
> >>>>>>> cluster users, if they want to have different weights for jobs,
> >> can
> >>>>> still
> >>>>>>> overwrite the weight through execution configs.
> >>>>>>>
> >>>>>>>
> >>>>>>>   - My recommendation would be to keep this as simple as
> >> possible.
> >>>>> This
> >>>>>>>> will make a lot of configuration code harder, and make it harder
> >>> for
> >>>>>> users
> >>>>>>>> to understand Flink's memory model.
> >>>>>>>>     Making things as easy for users to understand is very
> >>> important
> >>>>> in
> >>>>>> my
> >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> >> better
> >>>>> than
> >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> >>>>>>>
> >>>>>>> +1 from my side.
> >>>>>>>
> >>>>>>>
> >>>>>>>   - For the simplicity, we could go even further and simply have
> >>> two
> >>>>>> memory
> >>>>>>>> users at the moment: The operator algorithm/data-structure and
> >> the
> >>>>>> external
> >>>>>>>> language process (Python for now).
> >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> >>>>>> separate
> >>>>>>>> options is confusing as it suggests this can be combined
> >>>>> arbitrarily. I
> >>>>>>>> also think that a slim possibility that we may ever combine this
> >> in
> >>>>> the
> >>>>>>>> future is not enough reason to make it more complex/confusing.
> >>>>>>>
> >>>>>>>
> >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
> >>> never
> >>>>>> mixed
> >>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm"
> >> for
> >>>>>>> explicitly.
> >>>>>>>
> >>>>>>>
> >>>>>>> For "external language process", I'm not entirely sure. Future
> >>>>> external
> >>>>>>> languages are possibly mixed with python processes. To avoid later
> >>>>>>> considering how to share external language memory across different
> >>>>>>> languages, I would suggest to present the concept as "python
> >> memory"
> >>>>>> rather
> >>>>>>> than "external language process memory".
> >>>>>>>
> >>>>>>>
> >>>>>>> Thank you~
> >>>>>>>
> >>>>>>> Xintong Song
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for driving this proposal. A few thoughts on the current
> >>>>> design:
> >>>>>>>>
> >>>>>>>>   - There is a push to make as much as possible configurable via
> >>> the
> >>>>>> main
> >>>>>>>> configuration, and not only in code. Specifically values for
> >>>>> operations
> >>>>>> and
> >>>>>>>> tuning.
> >>>>>>>>     I think it would be more important to have such memory
> >> weights
> >>>>> in
> >>>>>> the
> >>>>>>>> config, compared to in the program API. /cc Aljoscha
> >>>>>>>>
> >>>>>>>>   - My recommendation would be to keep this as simple as
> >> possible.
> >>>>> This
> >>>>>>>> will make a lot of configuration code harder, and make it harder
> >>> for
> >>>>>> users
> >>>>>>>> to understand Flink's memory model.
> >>>>>>>>     Making things as easy for users to understand is very
> >>> important
> >>>>> in
> >>>>>> my
> >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> >> better
> >>>>> than
> >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> >>>>>>>>
> >>>>>>>>   - For the simplicity, we could go even further and simply have
> >>> two
> >>>>>> memory
> >>>>>>>> users at the moment: The operator algorithm/data-structure and
> >> the
> >>>>>> external
> >>>>>>>> language process (Python for now).
> >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> >>>>>> separate
> >>>>>>>> options is confusing as it suggests this can be combined
> >>>>> arbitrarily. I
> >>>>>>>> also think that a slim possibility that we may ever combine this
> >> in
> >>>>> the
> >>>>>>>> future is not enough reason to make it more complex/confusing.
> >>>>>>>>
> >>>>>>>>   - I am also not aware of any plans to combine the network and
> >>>>>> operator
> >>>>>>>> memory. Not that it would be infeasible to do this, but I think
> >>> this
> >>>>>> would
> >>>>>>>> also be orthogonal to this change, and I am not sure this would
> >> be
> >>>>>> solved
> >>>>>>>> with static weights. So trying to get network memory into this
> >>>>> proposal
> >>>>>>>> seems pre-mature to me.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Stephan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> >>> tonysong...@gmail.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> A quick question, does network memory treated as managed memory
> >>>>> now?
> >>>>>> Or
> >>>>>>>>> in
> >>>>>>>>>> the future?
> >>>>>>>>>>
> >>>>>>>>> No, network memory is independent from managed memory ATM. And
> >> I'm
> >>>>> not
> >>>>>>>>> aware of any plan to combine these two.
> >>>>>>>>>
> >>>>>>>>> Any insights there?
> >>>>>>>>>
> >>>>>>>>> Thank you~
> >>>>>>>>>
> >>>>>>>>> Xintong Song
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <ykt...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> A quick question, does network memory treated as managed memory
> >>>>> now?
> >>>>>> Or
> >>>>>>>>> in
> >>>>>>>>>> the future?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Kurt
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> >>>>> tonysong...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi devs,
> >>>>>>>>>>>
> >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
> >>> proposes
> >>>>> how
> >>>>>>>>>>> managed memory should be shared by various use cases within a
> >>>>> slot.
> >>>>>>>>> This
> >>>>>>>>>> is
> >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> >> state
> >>>>>>>> backend
> >>>>>>>>>> and
> >>>>>>>>>>> batch operators are the only use cases of managed memory for
> >>>>>>>> streaming
> >>>>>>>>>> and
> >>>>>>>>>>> batch jobs respectively, which is no longer true with the
> >>>>>>>> introduction
> >>>>>>>>> of
> >>>>>>>>>>> Python UDFs.
> >>>>>>>>>>>
> >>>>>>>>>>> Please notice that we have not reached consensus between two
> >>>>>>>> different
> >>>>>>>>>>> designs. The major part of this FLIP describes one of the
> >>>>> candidates,
> >>>>>>>>>> while
> >>>>>>>>>>> the alternative is discussed in the section "Rejected
> >>>>> Alternatives".
> >>>>>>>> We
> >>>>>>>>>> are
> >>>>>>>>>>> hoping to borrow intelligence from the community to help us
> >>>>> resolve
> >>>>>>>> the
> >>>>>>>>>>> disagreement.
> >>>>>>>>>>>
> >>>>>>>>>>> Any feedback would be appreciated.
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you~
> >>>>>>>>>>>
> >>>>>>>>>>> Xintong Song
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> >>>>>>>>>>>
> >>>>>>>>>>> [2]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Reply via email to