Thanks for driving this FLIP, Xintong! +1 to the updated version.
> 在 2020年9月2日,下午6:09,Xintong Song <tonysong...@gmail.com> 写道:
>
> Thanks for the input, Yu.
>
> I believe the current proposal should work with RocksDB, or any other state
> backend, using memory at either the slot or the scope. With the proposed
> approach, all we need is an indicator (e.g., a configuration option)
> telling us which scope should we calculate the fractions for.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Sep 2, 2020 at 4:53 PM Yu Li <car...@gmail.com> wrote:
>
>> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
>>
>> Just one supplement for the RocksDB state backend part:
>>
>> It's true that currently we're using managed memory at the slot scope.
>> However, IMHO, we may support setting weights for different stateful
>> operators (for advanced usage) in future. For example, users may choose to
>> set higher weights for join operator over aggregation operator, to give
>> more memory to those with bigger states. In this case, we may also use
>> managed memory at the operator scope for state backends. And if I
>> understand correctly, the current design could cover this case well.
>>
>> Best Regards,
>> Yu
>>
>>
>> On Wed, 2 Sep 2020 at 15:39, Xintong Song <tonysong...@gmail.com> wrote:
>>
>>> Thanks all for the feedback and discussion.
>>>
>>> I have updated the FLIP, with the following changes.
>>>
>>> - Choose the main proposal over the alternative approach
>>> - Combine weights of RocksDB and batch operators
>>> - Expose weights through configuration options, rather than via
>>> ExecutionConfig.
>>> - Add implementation plan.
>>>
>>> Please help take another look.
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>>
>>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <tonysong...@gmail.com>
>> wrote:
>>>
>>>> Thanks for the inputs, Aljoscha & Till.
>>>>
>>>>
>>>> # Weight Configuration
>>>>
>>>>
>>>> I think exposing the knobs incrementally is a good idea. However, I'm
>> not
>>>> sure about non-configurable as the first step.
>>>>
>>>>
>>>> Currently, users can tune memory for rocksdb
>>>> ('taskmanager.memory.managed.size') and python
>>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
>> which
>>>> practically means any combination of rocksdb and python memory sizes.
>> If
>>> we
>>>> switch to non-configurable weights, that will be a regression compared
>> to
>>>> 1.11.
>>>>
>>>>
>>>> Therefore, I think exposing via configuration options might be a good
>>>> first step. And we can discuss exposing via ExecutionConfig if later we
>>> see
>>>> that requirement.
>>>>
>>>>
>>>> # Naming of Weights
>>>>
>>>>
>>>> I'm neutral for "Flink/Internal memory".
>>>>
>>>>
>>>> I think the reason we can combine weights for batch algorithms and
>> state
>>>> backends is that they are never mixed together. My only concern
>>>> for "Flink/Internal memory", which might not be a problem at the
>> moment,
>>> is
>>>> that what if new memory use cases appear in the future, which can also
>> be
>>>> described by "Flink/Internal memory" but is not guaranteed not mixed
>> with
>>>> batch algorithms or state backends?
>>>>
>>>>
>>>> Anyway, I think the naming should not block this FLIP, as long as we
>> have
>>>> consensus on combining the two weights for rocksdb and batch
>> algorithms.
>>> We
>>>> can keep the naming discussion open until the implementation phase.
>>>>
>>>>
>>>> Thank you~
>>>>
>>>> Xintong Song
>>>>
>>>>
>>>>
>>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <trohrm...@apache.org>
>>>> wrote:
>>>>
>>>>> Thanks for creating this FLIP Xintong.
>>>>>
>>>>> I agree with the previous comments that the memory configuration
>> should
>>> be
>>>>> as easy as possible. Every new knob has the potential to confuse users
>>>>> and/or allows him to shoot himself in the foot. Consequently, I am +1
>>> for
>>>>> the first proposal in the FLIP since it is simpler.
>>>>>
>>>>> Also +1 for Stephan's proposal to combine batch operator's and
>>>>> RocksDB's memory usage into one weight.
>>>>>
>>>>> Concerning the names for the two weights, I fear that we are facing
>> one
>>> of
>>>>> the two hard things in computer science. To add another idea, we could
>>>>> name
>>>>> them "Flink memory"/"Internal memory" and "Python memory".
>>>>>
>>>>> For the sake of making the scope of the FLIP as small as possible and
>> to
>>>>> develop the feature incrementally, I think that Aljoscha's proposal to
>>>>> make
>>>>> it non-configurable for the first step sounds like a good idea. As a
>>> next
>>>>> step (and also if we see need), we can make the memory weights
>>>>> configurable
>>>>> via the configuration. And last, we could expose it via the
>>>>> ExecutionConfig
>>>>> if it is required.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <aljos...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> playing devils advocate here: should we even make the memory weights
>>>>>> configurable? We could go with weights that should make sense for
>> most
>>>>>> cases in the first version and only introduce configurable weights
>>> when
>>>>>> (if) users need them.
>>>>>>
>>>>>> Regarding where/how things are configured, I think that most things
>>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
>> makes
>>>>>> them configurable via flink-conf.yaml and via command line
>> parameters,
>>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
>> about
>>>>>> offering programmatic API for cases where it makes sense, of course.
>>>>>>
>>>>>> Regarding naming one of the configurable weights
>>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be
>> that
>>>>>> specific because the option will not age well. For example when we
>>> want
>>>>>> to change which group of memory consumers are configured together or
>>>>>> when we add something new.
>>>>>>
>>>>>> Best,
>>>>>> Aljoscha
>>>>>>
>>>>>> On 31.08.20 08:13, Xintong Song wrote:
>>>>>>> Thanks for the feedbacks, @Stephan
>>>>>>>
>>>>>>>
>>>>>>> - There is a push to make as much as possible configurable via
>>> the
>>>>>> main
>>>>>>>> configuration, and not only in code. Specifically values for
>>>>> operations
>>>>>> and
>>>>>>>> tuning.
>>>>>>>> I think it would be more important to have such memory
>> weights
>>>>> in
>>>>>> the
>>>>>>>> config, compared to in the program API. /cc Aljoscha
>>>>>>>
>>>>>>>
>>>>>>> I can see the benefit that having memory weights in the main
>>>>>> configuration
>>>>>>> makes tuning easier, which makes great sense to me. On the other
>>> hand,
>>>>>> what
>>>>>>> we lose is the flexibility to have different weights for jobs
>>> running
>>>>> in
>>>>>>> the same Flink cluster. It seems to me the problem is that we
>> don't
>>>>> have
>>>>>> an
>>>>>>> easy way to overwrite job-specific configurations without touching
>>> the
>>>>>>> codes.
>>>>>>>
>>>>>>>
>>>>>>> Given the current status, what if we make the memory weights
>>>>> configurable
>>>>>>> through both the main configuration and the programming API? The
>>> main
>>>>>>> configuration should take effect iff the weights are not
>> explicitly
>>>>>>> specified through the programming API. In this way, job cluster
>>> users
>>>>> can
>>>>>>> easily tune the weight through the main configuration, while
>> session
>>>>>>> cluster users, if they want to have different weights for jobs,
>> can
>>>>> still
>>>>>>> overwrite the weight through execution configs.
>>>>>>>
>>>>>>>
>>>>>>> - My recommendation would be to keep this as simple as
>> possible.
>>>>> This
>>>>>>>> will make a lot of configuration code harder, and make it harder
>>> for
>>>>>> users
>>>>>>>> to understand Flink's memory model.
>>>>>>>> Making things as easy for users to understand is very
>>> important
>>>>> in
>>>>>> my
>>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
>> better
>>>>> than
>>>>>>>> the alternative proposal listed at the end of the FLIP page.
>>>>>>>
>>>>>>> +1 from my side.
>>>>>>>
>>>>>>>
>>>>>>> - For the simplicity, we could go even further and simply have
>>> two
>>>>>> memory
>>>>>>>> users at the moment: The operator algorithm/data-structure and
>> the
>>>>>> external
>>>>>>>> language process (Python for now).
>>>>>>>> We never have batch algos and RocksDB mixed, having this as
>>>>>> separate
>>>>>>>> options is confusing as it suggests this can be combined
>>>>> arbitrarily. I
>>>>>>>> also think that a slim possibility that we may ever combine this
>> in
>>>>> the
>>>>>>>> future is not enough reason to make it more complex/confusing.
>>>>>>>
>>>>>>>
>>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
>>> never
>>>>>> mixed
>>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm"
>> for
>>>>>>> explicitly.
>>>>>>>
>>>>>>>
>>>>>>> For "external language process", I'm not entirely sure. Future
>>>>> external
>>>>>>> languages are possibly mixed with python processes. To avoid later
>>>>>>> considering how to share external language memory across different
>>>>>>> languages, I would suggest to present the concept as "python
>> memory"
>>>>>> rather
>>>>>>> than "external language process memory".
>>>>>>>
>>>>>>>
>>>>>>> Thank you~
>>>>>>>
>>>>>>> Xintong Song
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for driving this proposal. A few thoughts on the current
>>>>> design:
>>>>>>>>
>>>>>>>> - There is a push to make as much as possible configurable via
>>> the
>>>>>> main
>>>>>>>> configuration, and not only in code. Specifically values for
>>>>> operations
>>>>>> and
>>>>>>>> tuning.
>>>>>>>> I think it would be more important to have such memory
>> weights
>>>>> in
>>>>>> the
>>>>>>>> config, compared to in the program API. /cc Aljoscha
>>>>>>>>
>>>>>>>> - My recommendation would be to keep this as simple as
>> possible.
>>>>> This
>>>>>>>> will make a lot of configuration code harder, and make it harder
>>> for
>>>>>> users
>>>>>>>> to understand Flink's memory model.
>>>>>>>> Making things as easy for users to understand is very
>>> important
>>>>> in
>>>>>> my
>>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
>> better
>>>>> than
>>>>>>>> the alternative proposal listed at the end of the FLIP page.
>>>>>>>>
>>>>>>>> - For the simplicity, we could go even further and simply have
>>> two
>>>>>> memory
>>>>>>>> users at the moment: The operator algorithm/data-structure and
>> the
>>>>>> external
>>>>>>>> language process (Python for now).
>>>>>>>> We never have batch algos and RocksDB mixed, having this as
>>>>>> separate
>>>>>>>> options is confusing as it suggests this can be combined
>>>>> arbitrarily. I
>>>>>>>> also think that a slim possibility that we may ever combine this
>> in
>>>>> the
>>>>>>>> future is not enough reason to make it more complex/confusing.
>>>>>>>>
>>>>>>>> - I am also not aware of any plans to combine the network and
>>>>>> operator
>>>>>>>> memory. Not that it would be infeasible to do this, but I think
>>> this
>>>>>> would
>>>>>>>> also be orthogonal to this change, and I am not sure this would
>> be
>>>>>> solved
>>>>>>>> with static weights. So trying to get network memory into this
>>>>> proposal
>>>>>>>> seems pre-mature to me.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
>>> tonysong...@gmail.com
>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> A quick question, does network memory treated as managed memory
>>>>> now?
>>>>>> Or
>>>>>>>>> in
>>>>>>>>>> the future?
>>>>>>>>>>
>>>>>>>>> No, network memory is independent from managed memory ATM. And
>> I'm
>>>>> not
>>>>>>>>> aware of any plan to combine these two.
>>>>>>>>>
>>>>>>>>> Any insights there?
>>>>>>>>>
>>>>>>>>> Thank you~
>>>>>>>>>
>>>>>>>>> Xintong Song
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <ykt...@gmail.com>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> A quick question, does network memory treated as managed memory
>>>>> now?
>>>>>> Or
>>>>>>>>> in
>>>>>>>>>> the future?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Kurt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
>>>>> tonysong...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi devs,
>>>>>>>>>>>
>>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
>>> proposes
>>>>> how
>>>>>>>>>>> managed memory should be shared by various use cases within a
>>>>> slot.
>>>>>>>>> This
>>>>>>>>>> is
>>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
>> state
>>>>>>>> backend
>>>>>>>>>> and
>>>>>>>>>>> batch operators are the only use cases of managed memory for
>>>>>>>> streaming
>>>>>>>>>> and
>>>>>>>>>>> batch jobs respectively, which is no longer true with the
>>>>>>>> introduction
>>>>>>>>> of
>>>>>>>>>>> Python UDFs.
>>>>>>>>>>>
>>>>>>>>>>> Please notice that we have not reached consensus between two
>>>>>>>> different
>>>>>>>>>>> designs. The major part of this FLIP describes one of the
>>>>> candidates,
>>>>>>>>>> while
>>>>>>>>>>> the alternative is discussed in the section "Rejected
>>>>> Alternatives".
>>>>>>>> We
>>>>>>>>>> are
>>>>>>>>>>> hoping to borrow intelligence from the community to help us
>>>>> resolve
>>>>>>>> the
>>>>>>>>>>> disagreement.
>>>>>>>>>>>
>>>>>>>>>>> Any feedback would be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Thank you~
>>>>>>>>>>>
>>>>>>>>>>> Xintong Song
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
>>>>>>>>>>>
>>>>>>>>>>> [2]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>