Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Xintong Song Thu, 03 Sep 2020 03:44:36 -0700

Thanks all for the feedback.

FYI, I've opened a voting thread[1] on this.


Thank you~

Xintong Song


[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html


On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <[email protected]> wrote:

> Thanks for proposing this improvement! @Xintong
> The proposal looks good to me. Agreed that we should make it as simple as
> possible for users to understand.
>
> Thanks,
> Zhu
>
> Dian Fu <[email protected]> 于2020年9月3日周四 下午2:11写道：
>
> > Thanks for driving this FLIP, Xintong! +1 to the updated version.
> >
> > > 在 2020年9月2日，下午6:09，Xintong Song <[email protected]> 写道：
> > >
> > > Thanks for the input, Yu.
> > >
> > > I believe the current proposal should work with RocksDB, or any other
> > state
> > > backend, using memory at either the slot or the scope. With the
> proposed
> > > approach, all we need is an indicator (e.g., a configuration option)
> > > telling us which scope should we calculate the fractions for.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <[email protected]> wrote:
> > >
> > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
> > >>
> > >> Just one supplement for the RocksDB state backend part:
> > >>
> > >> It's true that currently we're using managed memory at the slot scope.
> > >> However, IMHO, we may support setting weights for different stateful
> > >> operators (for advanced usage) in future. For example, users may
> choose
> > to
> > >> set higher weights for join operator over aggregation operator, to
> give
> > >> more memory to those with bigger states. In this case, we may also use
> > >> managed memory at the operator scope for state backends. And if I
> > >> understand correctly, the current design could cover this case well.
> > >>
> > >> Best Regards,
> > >> Yu
> > >>
> > >>
> > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <[email protected]>
> > wrote:
> > >>
> > >>> Thanks all for the feedback and discussion.
> > >>>
> > >>> I have updated the FLIP, with the following changes.
> > >>>
> > >>>   - Choose the main proposal over the alternative approach
> > >>>   - Combine weights of RocksDB and batch operators
> > >>>   - Expose weights through configuration options, rather than via
> > >>>   ExecutionConfig.
> > >>>   - Add implementation plan.
> > >>>
> > >>> Please help take another look.
> > >>>
> > >>> Thank you~
> > >>>
> > >>> Xintong Song
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <[email protected]>
> > >> wrote:
> > >>>
> > >>>> Thanks for the inputs, Aljoscha & Till.
> > >>>>
> > >>>>
> > >>>> # Weight Configuration
> > >>>>
> > >>>>
> > >>>> I think exposing the knobs incrementally is a good idea. However,
> I'm
> > >> not
> > >>>> sure about non-configurable as the first step.
> > >>>>
> > >>>>
> > >>>> Currently, users can tune memory for rocksdb
> > >>>> ('taskmanager.memory.managed.size') and python
> > >>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
> > >> which
> > >>>> practically means any combination of rocksdb and python memory
> sizes.
> > >> If
> > >>> we
> > >>>> switch to non-configurable weights, that will be a regression
> compared
> > >> to
> > >>>> 1.11.
> > >>>>
> > >>>>
> > >>>> Therefore, I think exposing via configuration options might be a
> good
> > >>>> first step. And we can discuss exposing via ExecutionConfig if later
> > we
> > >>> see
> > >>>> that requirement.
> > >>>>
> > >>>>
> > >>>> # Naming of Weights
> > >>>>
> > >>>>
> > >>>> I'm neutral for "Flink/Internal memory".
> > >>>>
> > >>>>
> > >>>> I think the reason we can combine weights for batch algorithms and
> > >> state
> > >>>> backends is that they are never mixed together. My only concern
> > >>>> for "Flink/Internal memory", which might not be a problem at the
> > >> moment,
> > >>> is
> > >>>> that what if new memory use cases appear in the future, which can
> also
> > >> be
> > >>>> described by "Flink/Internal memory" but is not guaranteed not mixed
> > >> with
> > >>>> batch algorithms or state backends?
> > >>>>
> > >>>>
> > >>>> Anyway, I think the naming should not block this FLIP, as long as we
> > >> have
> > >>>> consensus on combining the two weights for rocksdb and batch
> > >> algorithms.
> > >>> We
> > >>>> can keep the naming discussion open until the implementation phase.
> > >>>>
> > >>>>
> > >>>> Thank you~
> > >>>>
> > >>>> Xintong Song
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <[email protected]
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks for creating this FLIP Xintong.
> > >>>>>
> > >>>>> I agree with the previous comments that the memory configuration
> > >> should
> > >>> be
> > >>>>> as easy as possible. Every new knob has the potential to confuse
> > users
> > >>>>> and/or allows him to shoot himself in the foot. Consequently, I am
> +1
> > >>> for
> > >>>>> the first proposal in the FLIP since it is simpler.
> > >>>>>
> > >>>>> Also +1 for Stephan's proposal to combine batch operator's and
> > >>>>> RocksDB's memory usage into one weight.
> > >>>>>
> > >>>>> Concerning the names for the two weights, I fear that we are facing
> > >> one
> > >>> of
> > >>>>> the two hard things in computer science. To add another idea, we
> > could
> > >>>>> name
> > >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> > >>>>>
> > >>>>> For the sake of making the scope of the FLIP as small as possible
> and
> > >> to
> > >>>>> develop the feature incrementally, I think that Aljoscha's proposal
> > to
> > >>>>> make
> > >>>>> it non-configurable for the first step sounds like a good idea. As
> a
> > >>> next
> > >>>>> step (and also if we see need), we can make the memory weights
> > >>>>> configurable
> > >>>>> via the configuration. And last, we could expose it via the
> > >>>>> ExecutionConfig
> > >>>>> if it is required.
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Till
> > >>>>>
> > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <
> [email protected]
> > >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> playing devils advocate here: should we even make the memory
> weights
> > >>>>>> configurable? We could go with weights that should make sense for
> > >> most
> > >>>>>> cases in the first version and only introduce configurable weights
> > >>> when
> > >>>>>> (if) users need them.
> > >>>>>>
> > >>>>>> Regarding where/how things are configured, I think that most
> things
> > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
> > >> makes
> > >>>>>> them configurable via flink-conf.yaml and via command line
> > >> parameters,
> > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
> > >> about
> > >>>>>> offering programmatic API for cases where it makes sense, of
> course.
> > >>>>>>
> > >>>>>> Regarding naming one of the configurable weights
> > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be
> > >> that
> > >>>>>> specific because the option will not age well. For example when we
> > >>> want
> > >>>>>> to change which group of memory consumers are configured together
> or
> > >>>>>> when we add something new.
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Aljoscha
> > >>>>>>
> > >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> > >>>>>>> Thanks for the feedbacks, @Stephan
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>   - There is a push to make as much as possible configurable via
> > >>> the
> > >>>>>> main
> > >>>>>>>> configuration, and not only in code. Specifically values for
> > >>>>> operations
> > >>>>>> and
> > >>>>>>>> tuning.
> > >>>>>>>>     I think it would be more important to have such memory
> > >> weights
> > >>>>> in
> > >>>>>> the
> > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> I can see the benefit that having memory weights in the main
> > >>>>>> configuration
> > >>>>>>> makes tuning easier, which makes great sense to me. On the other
> > >>> hand,
> > >>>>>> what
> > >>>>>>> we lose is the flexibility to have different weights for jobs
> > >>> running
> > >>>>> in
> > >>>>>>> the same Flink cluster. It seems to me the problem is that we
> > >> don't
> > >>>>> have
> > >>>>>> an
> > >>>>>>> easy way to overwrite job-specific configurations without
> touching
> > >>> the
> > >>>>>>> codes.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Given the current status, what if we make the memory weights
> > >>>>> configurable
> > >>>>>>> through both the main configuration and the programming API? The
> > >>> main
> > >>>>>>> configuration should take effect iff the weights are not
> > >> explicitly
> > >>>>>>> specified through the programming API. In this way, job cluster
> > >>> users
> > >>>>> can
> > >>>>>>> easily tune the weight through the main configuration, while
> > >> session
> > >>>>>>> cluster users, if they want to have different weights for jobs,
> > >> can
> > >>>>> still
> > >>>>>>> overwrite the weight through execution configs.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>   - My recommendation would be to keep this as simple as
> > >> possible.
> > >>>>> This
> > >>>>>>>> will make a lot of configuration code harder, and make it harder
> > >>> for
> > >>>>>> users
> > >>>>>>>> to understand Flink's memory model.
> > >>>>>>>>     Making things as easy for users to understand is very
> > >>> important
> > >>>>> in
> > >>>>>> my
> > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > >> better
> > >>>>> than
> > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > >>>>>>>
> > >>>>>>> +1 from my side.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>   - For the simplicity, we could go even further and simply have
> > >>> two
> > >>>>>> memory
> > >>>>>>>> users at the moment: The operator algorithm/data-structure and
> > >> the
> > >>>>>> external
> > >>>>>>>> language process (Python for now).
> > >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> > >>>>>> separate
> > >>>>>>>> options is confusing as it suggests this can be combined
> > >>>>> arbitrarily. I
> > >>>>>>>> also think that a slim possibility that we may ever combine this
> > >> in
> > >>>>> the
> > >>>>>>>> future is not enough reason to make it more complex/confusing.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
> > >>> never
> > >>>>>> mixed
> > >>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm"
> > >> for
> > >>>>>>> explicitly.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> For "external language process", I'm not entirely sure. Future
> > >>>>> external
> > >>>>>>> languages are possibly mixed with python processes. To avoid
> later
> > >>>>>>> considering how to share external language memory across
> different
> > >>>>>>> languages, I would suggest to present the concept as "python
> > >> memory"
> > >>>>>> rather
> > >>>>>>> than "external language process memory".
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Thank you~
> > >>>>>>>
> > >>>>>>> Xintong Song
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <[email protected]>
> > >>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks for driving this proposal. A few thoughts on the current
> > >>>>> design:
> > >>>>>>>>
> > >>>>>>>>   - There is a push to make as much as possible configurable via
> > >>> the
> > >>>>>> main
> > >>>>>>>> configuration, and not only in code. Specifically values for
> > >>>>> operations
> > >>>>>> and
> > >>>>>>>> tuning.
> > >>>>>>>>     I think it would be more important to have such memory
> > >> weights
> > >>>>> in
> > >>>>>> the
> > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > >>>>>>>>
> > >>>>>>>>   - My recommendation would be to keep this as simple as
> > >> possible.
> > >>>>> This
> > >>>>>>>> will make a lot of configuration code harder, and make it harder
> > >>> for
> > >>>>>> users
> > >>>>>>>> to understand Flink's memory model.
> > >>>>>>>>     Making things as easy for users to understand is very
> > >>> important
> > >>>>> in
> > >>>>>> my
> > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > >> better
> > >>>>> than
> > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > >>>>>>>>
> > >>>>>>>>   - For the simplicity, we could go even further and simply have
> > >>> two
> > >>>>>> memory
> > >>>>>>>> users at the moment: The operator algorithm/data-structure and
> > >> the
> > >>>>>> external
> > >>>>>>>> language process (Python for now).
> > >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> > >>>>>> separate
> > >>>>>>>> options is confusing as it suggests this can be combined
> > >>>>> arbitrarily. I
> > >>>>>>>> also think that a slim possibility that we may ever combine this
> > >> in
> > >>>>> the
> > >>>>>>>> future is not enough reason to make it more complex/confusing.
> > >>>>>>>>
> > >>>>>>>>   - I am also not aware of any plans to combine the network and
> > >>>>>> operator
> > >>>>>>>> memory. Not that it would be infeasible to do this, but I think
> > >>> this
> > >>>>>> would
> > >>>>>>>> also be orthogonal to this change, and I am not sure this would
> > >> be
> > >>>>>> solved
> > >>>>>>>> with static weights. So trying to get network memory into this
> > >>>>> proposal
> > >>>>>>>> seems pre-mature to me.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Stephan
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> > >>> [email protected]
> > >>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> A quick question, does network memory treated as managed
> memory
> > >>>>> now?
> > >>>>>> Or
> > >>>>>>>>> in
> > >>>>>>>>>> the future?
> > >>>>>>>>>>
> > >>>>>>>>> No, network memory is independent from managed memory ATM. And
> > >> I'm
> > >>>>> not
> > >>>>>>>>> aware of any plan to combine these two.
> > >>>>>>>>>
> > >>>>>>>>> Any insights there?
> > >>>>>>>>>
> > >>>>>>>>> Thank you~
> > >>>>>>>>>
> > >>>>>>>>> Xintong Song
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <[email protected]>
> > >>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> A quick question, does network memory treated as managed
> memory
> > >>>>> now?
> > >>>>>> Or
> > >>>>>>>>> in
> > >>>>>>>>>> the future?
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Kurt
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> > >>>>> [email protected]>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi devs,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
> > >>> proposes
> > >>>>> how
> > >>>>>>>>>>> managed memory should be shared by various use cases within a
> > >>>>> slot.
> > >>>>>>>>> This
> > >>>>>>>>>> is
> > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> > >> state
> > >>>>>>>> backend
> > >>>>>>>>>> and
> > >>>>>>>>>>> batch operators are the only use cases of managed memory for
> > >>>>>>>> streaming
> > >>>>>>>>>> and
> > >>>>>>>>>>> batch jobs respectively, which is no longer true with the
> > >>>>>>>> introduction
> > >>>>>>>>> of
> > >>>>>>>>>>> Python UDFs.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Please notice that we have not reached consensus between two
> > >>>>>>>> different
> > >>>>>>>>>>> designs. The major part of this FLIP describes one of the
> > >>>>> candidates,
> > >>>>>>>>>> while
> > >>>>>>>>>>> the alternative is discussed in the section "Rejected
> > >>>>> Alternatives".
> > >>>>>>>> We
> > >>>>>>>>>> are
> > >>>>>>>>>>> hoping to borrow intelligence from the community to help us
> > >>>>> resolve
> > >>>>>>>> the
> > >>>>>>>>>>> disagreement.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Any feedback would be appreciated.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thank you~
> > >>>>>>>>>>>
> > >>>>>>>>>>> Xintong Song
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> [1]
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> > >>>>>>>>>>>
> > >>>>>>>>>>> [2]
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Reply via email to