Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Till Rohrmann Tue, 01 Sep 2020 07:19:33 -0700

Thanks for creating this FLIP Xintong.

I agree with the previous comments that the memory configuration should be
as easy as possible. Every new knob has the potential to confuse users
and/or allows him to shoot himself in the foot. Consequently, I am +1 for
the first proposal in the FLIP since it is simpler.


Also +1 for Stephan's proposal to combine batch operator's and
RocksDB's memory usage into one weight.

Concerning the names for the two weights, I fear that we are facing one of
the two hard things in computer science. To add another idea, we could name
them "Flink memory"/"Internal memory" and "Python memory".

For the sake of making the scope of the FLIP as small as possible and to
develop the feature incrementally, I think that Aljoscha's proposal to make
it non-configurable for the first step sounds like a good idea. As a next
step (and also if we see need), we can make the memory weights configurable
via the configuration. And last, we could expose it via the ExecutionConfig
if it is required.

Cheers,
Till

On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <[email protected]> wrote:

> Hi,
>
> playing devils advocate here: should we even make the memory weights
> configurable? We could go with weights that should make sense for most
> cases in the first version and only introduce configurable weights when
> (if) users need them.
>
> Regarding where/how things are configured, I think that most things
> should be a ConfigOption first (Thanks cc'in me, Stephan!). This makes
> them configurable via flink-conf.yaml and via command line parameters,
> for example "bin/flink run -D memory.foo=bla ...". We can think about
> offering programmatic API for cases where it makes sense, of course.
>
> Regarding naming one of the configurable weights
> "StateBackend-BatchAlgorithm". I think it's not a good idea to be that
> specific because the option will not age well. For example when we want
> to change which group of memory consumers are configured together or
> when we add something new.
>
> Best,
> Aljoscha
>
> On 31.08.20 08:13, Xintong Song wrote:
> > Thanks for the feedbacks, @Stephan
> >
> >
> >    - There is a push to make as much as possible configurable via the
> main
> >> configuration, and not only in code. Specifically values for operations
> and
> >> tuning.
> >>      I think it would be more important to have such memory weights in
> the
> >> config, compared to in the program API. /cc Aljoscha
> >
> >
> > I can see the benefit that having memory weights in the main
> configuration
> > makes tuning easier, which makes great sense to me. On the other hand,
> what
> > we lose is the flexibility to have different weights for jobs running in
> > the same Flink cluster. It seems to me the problem is that we don't have
> an
> > easy way to overwrite job-specific configurations without touching the
> > codes.
> >
> >
> > Given the current status, what if we make the memory weights configurable
> > through both the main configuration and the programming API? The main
> > configuration should take effect iff the weights are not explicitly
> > specified through the programming API. In this way, job cluster users can
> > easily tune the weight through the main configuration, while session
> > cluster users, if they want to have different weights for jobs, can still
> > overwrite the weight through execution configs.
> >
> >
> >    - My recommendation would be to keep this as simple as possible. This
> >> will make a lot of configuration code harder, and make it harder for
> users
> >> to understand Flink's memory model.
> >>      Making things as easy for users to understand is very important in
> my
> >> opinion. In that regard, the main proposal in the FLIP seems better than
> >> the alternative proposal listed at the end of the FLIP page.
> >
> > +1 from my side.
> >
> >
> >    - For the simplicity, we could go even further and simply have two
> memory
> >> users at the moment: The operator algorithm/data-structure and the
> external
> >> language process (Python for now).
> >>      We never have batch algos and RocksDB mixed, having this as
> separate
> >> options is confusing as it suggests this can be combined arbitrarily. I
> >> also think that a slim possibility that we may ever combine this in the
> >> future is not enough reason to make it more complex/confusing.
> >
> >
> > Good point. +1 for combining batch/rocksdb weights, for they're never
> mixed
> > together. We can even just name it "StateBackend-BatchAlgorithm" for
> > explicitly.
> >
> >
> > For "external language process", I'm not entirely sure. Future external
> > languages are possibly mixed with python processes. To avoid later
> > considering how to share external language memory across different
> > languages, I would suggest to present the concept as "python memory"
> rather
> > than "external language process memory".
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <[email protected]> wrote:
> >
> >> Thanks for driving this proposal. A few thoughts on the current design:
> >>
> >>    - There is a push to make as much as possible configurable via the
> main
> >> configuration, and not only in code. Specifically values for operations
> and
> >> tuning.
> >>      I think it would be more important to have such memory weights in
> the
> >> config, compared to in the program API. /cc Aljoscha
> >>
> >>    - My recommendation would be to keep this as simple as possible. This
> >> will make a lot of configuration code harder, and make it harder for
> users
> >> to understand Flink's memory model.
> >>      Making things as easy for users to understand is very important in
> my
> >> opinion. In that regard, the main proposal in the FLIP seems better than
> >> the alternative proposal listed at the end of the FLIP page.
> >>
> >>    - For the simplicity, we could go even further and simply have two
> memory
> >> users at the moment: The operator algorithm/data-structure and the
> external
> >> language process (Python for now).
> >>      We never have batch algos and RocksDB mixed, having this as
> separate
> >> options is confusing as it suggests this can be combined arbitrarily. I
> >> also think that a slim possibility that we may ever combine this in the
> >> future is not enough reason to make it more complex/confusing.
> >>
> >>    - I am also not aware of any plans to combine the network and
> operator
> >> memory. Not that it would be infeasible to do this, but I think this
> would
> >> also be orthogonal to this change, and I am not sure this would be
> solved
> >> with static weights. So trying to get network memory into this proposal
> >> seems pre-mature to me.
> >>
> >> Best,
> >> Stephan
> >>
> >>
> >> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <[email protected]>
> >> wrote:
> >>
> >>>>
> >>>> A quick question, does network memory treated as managed memory now?
> Or
> >>> in
> >>>> the future?
> >>>>
> >>> No, network memory is independent from managed memory ATM. And I'm not
> >>> aware of any plan to combine these two.
> >>>
> >>> Any insights there?
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <[email protected]> wrote:
> >>>
> >>>> A quick question, does network memory treated as managed memory now?
> Or
> >>> in
> >>>> the future?
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> Hi devs,
> >>>>>
> >>>>> I'd like to bring the discussion over FLIP-141[1], which proposes how
> >>>>> managed memory should be shared by various use cases within a slot.
> >>> This
> >>>> is
> >>>>> an extension to FLIP-53[2], where we assumed that RocksDB state
> >> backend
> >>>> and
> >>>>> batch operators are the only use cases of managed memory for
> >> streaming
> >>>> and
> >>>>> batch jobs respectively, which is no longer true with the
> >> introduction
> >>> of
> >>>>> Python UDFs.
> >>>>>
> >>>>> Please notice that we have not reached consensus between two
> >> different
> >>>>> designs. The major part of this FLIP describes one of the candidates,
> >>>> while
> >>>>> the alternative is discussed in the section "Rejected Alternatives".
> >> We
> >>>> are
> >>>>> hoping to borrow intelligence from the community to help us resolve
> >> the
> >>>>> disagreement.
> >>>>>
> >>>>> Any feedback would be appreciated.
> >>>>>
> >>>>> Thank you~
> >>>>>
> >>>>> Xintong Song
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> >>>>>
> >>>>> [2]
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Reply via email to