Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Dian Fu Wed, 02 Sep 2020 23:11:39 -0700
Thanks for driving this FLIP, Xintong! +1 to the updated version.

> 在 2020年9月2日，下午6:09，Xintong Song <[email protected]> 写道：
> 
> Thanks for the input, Yu.
> 
> I believe the current proposal should work with RocksDB, or any other state
> backend, using memory at either the slot or the scope. With the proposed
> approach, all we need is an indicator (e.g., a configuration option)
> telling us which scope should we calculate the fractions for.
> 
> Thank you~
> 
> Xintong Song
> 
> 
> 
> On Wed, Sep 2, 2020 at 4:53 PM Yu Li <[email protected]> wrote:
> 
>> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
>> 
>> Just one supplement for the RocksDB state backend part:
>> 
>> It's true that currently we're using managed memory at the slot scope.
>> However, IMHO, we may support setting weights for different stateful
>> operators (for advanced usage) in future. For example, users may choose to
>> set higher weights for join operator over aggregation operator, to give
>> more memory to those with bigger states. In this case, we may also use
>> managed memory at the operator scope for state backends. And if I
>> understand correctly, the current design could cover this case well.
>> 
>> Best Regards,
>> Yu
>> 
>> 
>> On Wed, 2 Sep 2020 at 15:39, Xintong Song <[email protected]> wrote:
>> 
>>> Thanks all for the feedback and discussion.
>>> 
>>> I have updated the FLIP, with the following changes.
>>> 
>>>   - Choose the main proposal over the alternative approach
>>>   - Combine weights of RocksDB and batch operators
>>>   - Expose weights through configuration options, rather than via
>>>   ExecutionConfig.
>>>   - Add implementation plan.
>>> 
>>> Please help take another look.
>>> 
>>> Thank you~
>>> 
>>> Xintong Song
>>> 
>>> 
>>> 
>>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <[email protected]>
>> wrote:
>>> 
>>>> Thanks for the inputs, Aljoscha & Till.
>>>> 
>>>> 
>>>> # Weight Configuration
>>>> 
>>>> 
>>>> I think exposing the knobs incrementally is a good idea. However, I'm
>> not
>>>> sure about non-configurable as the first step.
>>>> 
>>>> 
>>>> Currently, users can tune memory for rocksdb
>>>> ('taskmanager.memory.managed.size') and python
>>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
>> which
>>>> practically means any combination of rocksdb and python memory sizes.
>> If
>>> we
>>>> switch to non-configurable weights, that will be a regression compared
>> to
>>>> 1.11.
>>>> 
>>>> 
>>>> Therefore, I think exposing via configuration options might be a good
>>>> first step. And we can discuss exposing via ExecutionConfig if later we
>>> see
>>>> that requirement.
>>>> 
>>>> 
>>>> # Naming of Weights
>>>> 
>>>> 
>>>> I'm neutral for "Flink/Internal memory".
>>>> 
>>>> 
>>>> I think the reason we can combine weights for batch algorithms and
>> state
>>>> backends is that they are never mixed together. My only concern
>>>> for "Flink/Internal memory", which might not be a problem at the
>> moment,
>>> is
>>>> that what if new memory use cases appear in the future, which can also
>> be
>>>> described by "Flink/Internal memory" but is not guaranteed not mixed
>> with
>>>> batch algorithms or state backends?
>>>> 
>>>> 
>>>> Anyway, I think the naming should not block this FLIP, as long as we
>> have
>>>> consensus on combining the two weights for rocksdb and batch
>> algorithms.
>>> We
>>>> can keep the naming discussion open until the implementation phase.
>>>> 
>>>> 
>>>> Thank you~
>>>> 
>>>> Xintong Song
>>>> 
>>>> 
>>>> 
>>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <[email protected]>
>>>> wrote:
>>>> 
>>>>> Thanks for creating this FLIP Xintong.
>>>>> 
>>>>> I agree with the previous comments that the memory configuration
>> should
>>> be
>>>>> as easy as possible. Every new knob has the potential to confuse users
>>>>> and/or allows him to shoot himself in the foot. Consequently, I am +1
>>> for
>>>>> the first proposal in the FLIP since it is simpler.
>>>>> 
>>>>> Also +1 for Stephan's proposal to combine batch operator's and
>>>>> RocksDB's memory usage into one weight.
>>>>> 
>>>>> Concerning the names for the two weights, I fear that we are facing
>> one
>>> of
>>>>> the two hard things in computer science. To add another idea, we could
>>>>> name
>>>>> them "Flink memory"/"Internal memory" and "Python memory".
>>>>> 
>>>>> For the sake of making the scope of the FLIP as small as possible and
>> to
>>>>> develop the feature incrementally, I think that Aljoscha's proposal to
>>>>> make
>>>>> it non-configurable for the first step sounds like a good idea. As a
>>> next
>>>>> step (and also if we see need), we can make the memory weights
>>>>> configurable
>>>>> via the configuration. And last, we could expose it via the
>>>>> ExecutionConfig
>>>>> if it is required.
>>>>> 
>>>>> Cheers,
>>>>> Till
>>>>> 
>>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> playing devils advocate here: should we even make the memory weights
>>>>>> configurable? We could go with weights that should make sense for
>> most
>>>>>> cases in the first version and only introduce configurable weights
>>> when
>>>>>> (if) users need them.
>>>>>> 
>>>>>> Regarding where/how things are configured, I think that most things
>>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
>> makes
>>>>>> them configurable via flink-conf.yaml and via command line
>> parameters,
>>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
>> about
>>>>>> offering programmatic API for cases where it makes sense, of course.
>>>>>> 
>>>>>> Regarding naming one of the configurable weights
>>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be
>> that
>>>>>> specific because the option will not age well. For example when we
>>> want
>>>>>> to change which group of memory consumers are configured together or
>>>>>> when we add something new.
>>>>>> 
>>>>>> Best,
>>>>>> Aljoscha
>>>>>> 
>>>>>> On 31.08.20 08:13, Xintong Song wrote:
>>>>>>> Thanks for the feedbacks, @Stephan
>>>>>>> 
>>>>>>> 
>>>>>>>   - There is a push to make as much as possible configurable via
>>> the
>>>>>> main
>>>>>>>> configuration, and not only in code. Specifically values for
>>>>> operations
>>>>>> and
>>>>>>>> tuning.
>>>>>>>>     I think it would be more important to have such memory
>> weights
>>>>> in
>>>>>> the
>>>>>>>> config, compared to in the program API. /cc Aljoscha
>>>>>>> 
>>>>>>> 
>>>>>>> I can see the benefit that having memory weights in the main
>>>>>> configuration
>>>>>>> makes tuning easier, which makes great sense to me. On the other
>>> hand,
>>>>>> what
>>>>>>> we lose is the flexibility to have different weights for jobs
>>> running
>>>>> in
>>>>>>> the same Flink cluster. It seems to me the problem is that we
>> don't
>>>>> have
>>>>>> an
>>>>>>> easy way to overwrite job-specific configurations without touching
>>> the
>>>>>>> codes.
>>>>>>> 
>>>>>>> 
>>>>>>> Given the current status, what if we make the memory weights
>>>>> configurable
>>>>>>> through both the main configuration and the programming API? The
>>> main
>>>>>>> configuration should take effect iff the weights are not
>> explicitly
>>>>>>> specified through the programming API. In this way, job cluster
>>> users
>>>>> can
>>>>>>> easily tune the weight through the main configuration, while
>> session
>>>>>>> cluster users, if they want to have different weights for jobs,
>> can
>>>>> still
>>>>>>> overwrite the weight through execution configs.
>>>>>>> 
>>>>>>> 
>>>>>>>   - My recommendation would be to keep this as simple as
>> possible.
>>>>> This
>>>>>>>> will make a lot of configuration code harder, and make it harder
>>> for
>>>>>> users
>>>>>>>> to understand Flink's memory model.
>>>>>>>>     Making things as easy for users to understand is very
>>> important
>>>>> in
>>>>>> my
>>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
>> better
>>>>> than
>>>>>>>> the alternative proposal listed at the end of the FLIP page.
>>>>>>> 
>>>>>>> +1 from my side.
>>>>>>> 
>>>>>>> 
>>>>>>>   - For the simplicity, we could go even further and simply have
>>> two
>>>>>> memory
>>>>>>>> users at the moment: The operator algorithm/data-structure and
>> the
>>>>>> external
>>>>>>>> language process (Python for now).
>>>>>>>>     We never have batch algos and RocksDB mixed, having this as
>>>>>> separate
>>>>>>>> options is confusing as it suggests this can be combined
>>>>> arbitrarily. I
>>>>>>>> also think that a slim possibility that we may ever combine this
>> in
>>>>> the
>>>>>>>> future is not enough reason to make it more complex/confusing.
>>>>>>> 
>>>>>>> 
>>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
>>> never
>>>>>> mixed
>>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm"
>> for
>>>>>>> explicitly.
>>>>>>> 
>>>>>>> 
>>>>>>> For "external language process", I'm not entirely sure. Future
>>>>> external
>>>>>>> languages are possibly mixed with python processes. To avoid later
>>>>>>> considering how to share external language memory across different
>>>>>>> languages, I would suggest to present the concept as "python
>> memory"
>>>>>> rather
>>>>>>> than "external language process memory".
>>>>>>> 
>>>>>>> 
>>>>>>> Thank you~
>>>>>>> 
>>>>>>> Xintong Song
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <[email protected]>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks for driving this proposal. A few thoughts on the current
>>>>> design:
>>>>>>>> 
>>>>>>>>   - There is a push to make as much as possible configurable via
>>> the
>>>>>> main
>>>>>>>> configuration, and not only in code. Specifically values for
>>>>> operations
>>>>>> and
>>>>>>>> tuning.
>>>>>>>>     I think it would be more important to have such memory
>> weights
>>>>> in
>>>>>> the
>>>>>>>> config, compared to in the program API. /cc Aljoscha
>>>>>>>> 
>>>>>>>>   - My recommendation would be to keep this as simple as
>> possible.
>>>>> This
>>>>>>>> will make a lot of configuration code harder, and make it harder
>>> for
>>>>>> users
>>>>>>>> to understand Flink's memory model.
>>>>>>>>     Making things as easy for users to understand is very
>>> important
>>>>> in
>>>>>> my
>>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
>> better
>>>>> than
>>>>>>>> the alternative proposal listed at the end of the FLIP page.
>>>>>>>> 
>>>>>>>>   - For the simplicity, we could go even further and simply have
>>> two
>>>>>> memory
>>>>>>>> users at the moment: The operator algorithm/data-structure and
>> the
>>>>>> external
>>>>>>>> language process (Python for now).
>>>>>>>>     We never have batch algos and RocksDB mixed, having this as
>>>>>> separate
>>>>>>>> options is confusing as it suggests this can be combined
>>>>> arbitrarily. I
>>>>>>>> also think that a slim possibility that we may ever combine this
>> in
>>>>> the
>>>>>>>> future is not enough reason to make it more complex/confusing.
>>>>>>>> 
>>>>>>>>   - I am also not aware of any plans to combine the network and
>>>>>> operator
>>>>>>>> memory. Not that it would be infeasible to do this, but I think
>>> this
>>>>>> would
>>>>>>>> also be orthogonal to this change, and I am not sure this would
>> be
>>>>>> solved
>>>>>>>> with static weights. So trying to get network memory into this
>>>>> proposal
>>>>>>>> seems pre-mature to me.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Stephan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
>>> [email protected]
>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> A quick question, does network memory treated as managed memory
>>>>> now?
>>>>>> Or
>>>>>>>>> in
>>>>>>>>>> the future?
>>>>>>>>>> 
>>>>>>>>> No, network memory is independent from managed memory ATM. And
>> I'm
>>>>> not
>>>>>>>>> aware of any plan to combine these two.
>>>>>>>>> 
>>>>>>>>> Any insights there?
>>>>>>>>> 
>>>>>>>>> Thank you~
>>>>>>>>> 
>>>>>>>>> Xintong Song
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <[email protected]>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> A quick question, does network memory treated as managed memory
>>>>> now?
>>>>>> Or
>>>>>>>>> in
>>>>>>>>>> the future?
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Kurt
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi devs,
>>>>>>>>>>> 
>>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
>>> proposes
>>>>> how
>>>>>>>>>>> managed memory should be shared by various use cases within a
>>>>> slot.
>>>>>>>>> This
>>>>>>>>>> is
>>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
>> state
>>>>>>>> backend
>>>>>>>>>> and
>>>>>>>>>>> batch operators are the only use cases of managed memory for
>>>>>>>> streaming
>>>>>>>>>> and
>>>>>>>>>>> batch jobs respectively, which is no longer true with the
>>>>>>>> introduction
>>>>>>>>> of
>>>>>>>>>>> Python UDFs.
>>>>>>>>>>> 
>>>>>>>>>>> Please notice that we have not reached consensus between two
>>>>>>>> different
>>>>>>>>>>> designs. The major part of this FLIP describes one of the
>>>>> candidates,
>>>>>>>>>> while
>>>>>>>>>>> the alternative is discussed in the section "Rejected
>>>>> Alternatives".
>>>>>>>> We
>>>>>>>>>> are
>>>>>>>>>>> hoping to borrow intelligence from the community to help us
>>>>> resolve
>>>>>>>> the
>>>>>>>>>>> disagreement.
>>>>>>>>>>> 
>>>>>>>>>>> Any feedback would be appreciated.
>>>>>>>>>>> 
>>>>>>>>>>> Thank you~
>>>>>>>>>>> 
>>>>>>>>>>> Xintong Song
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
>>>>>>>>>>> 
>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>
Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Reply via email to