Thanks for proposing this improvement! @Xintong The proposal looks good to me. Agreed that we should make it as simple as possible for users to understand.
Thanks, Zhu Dian Fu <dian0511...@gmail.com> 于2020年9月3日周四 下午2:11写道: > Thanks for driving this FLIP, Xintong! +1 to the updated version. > > > 在 2020年9月2日,下午6:09,Xintong Song <tonysong...@gmail.com> 写道: > > > > Thanks for the input, Yu. > > > > I believe the current proposal should work with RocksDB, or any other > state > > backend, using memory at either the slot or the scope. With the proposed > > approach, all we need is an indicator (e.g., a configuration option) > > telling us which scope should we calculate the fractions for. > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <car...@gmail.com> wrote: > > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc. > >> > >> Just one supplement for the RocksDB state backend part: > >> > >> It's true that currently we're using managed memory at the slot scope. > >> However, IMHO, we may support setting weights for different stateful > >> operators (for advanced usage) in future. For example, users may choose > to > >> set higher weights for join operator over aggregation operator, to give > >> more memory to those with bigger states. In this case, we may also use > >> managed memory at the operator scope for state backends. And if I > >> understand correctly, the current design could cover this case well. > >> > >> Best Regards, > >> Yu > >> > >> > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <tonysong...@gmail.com> > wrote: > >> > >>> Thanks all for the feedback and discussion. > >>> > >>> I have updated the FLIP, with the following changes. > >>> > >>> - Choose the main proposal over the alternative approach > >>> - Combine weights of RocksDB and batch operators > >>> - Expose weights through configuration options, rather than via > >>> ExecutionConfig. > >>> - Add implementation plan. > >>> > >>> Please help take another look. > >>> > >>> Thank you~ > >>> > >>> Xintong Song > >>> > >>> > >>> > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <tonysong...@gmail.com> > >> wrote: > >>> > >>>> Thanks for the inputs, Aljoscha & Till. > >>>> > >>>> > >>>> # Weight Configuration > >>>> > >>>> > >>>> I think exposing the knobs incrementally is a good idea. However, I'm > >> not > >>>> sure about non-configurable as the first step. > >>>> > >>>> > >>>> Currently, users can tune memory for rocksdb > >>>> ('taskmanager.memory.managed.size') and python > >>>> ('python.fn-execution.[framework|buffer].memory.size') separately, > >> which > >>>> practically means any combination of rocksdb and python memory sizes. > >> If > >>> we > >>>> switch to non-configurable weights, that will be a regression compared > >> to > >>>> 1.11. > >>>> > >>>> > >>>> Therefore, I think exposing via configuration options might be a good > >>>> first step. And we can discuss exposing via ExecutionConfig if later > we > >>> see > >>>> that requirement. > >>>> > >>>> > >>>> # Naming of Weights > >>>> > >>>> > >>>> I'm neutral for "Flink/Internal memory". > >>>> > >>>> > >>>> I think the reason we can combine weights for batch algorithms and > >> state > >>>> backends is that they are never mixed together. My only concern > >>>> for "Flink/Internal memory", which might not be a problem at the > >> moment, > >>> is > >>>> that what if new memory use cases appear in the future, which can also > >> be > >>>> described by "Flink/Internal memory" but is not guaranteed not mixed > >> with > >>>> batch algorithms or state backends? > >>>> > >>>> > >>>> Anyway, I think the naming should not block this FLIP, as long as we > >> have > >>>> consensus on combining the two weights for rocksdb and batch > >> algorithms. > >>> We > >>>> can keep the naming discussion open until the implementation phase. > >>>> > >>>> > >>>> Thank you~ > >>>> > >>>> Xintong Song > >>>> > >>>> > >>>> > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <trohrm...@apache.org> > >>>> wrote: > >>>> > >>>>> Thanks for creating this FLIP Xintong. > >>>>> > >>>>> I agree with the previous comments that the memory configuration > >> should > >>> be > >>>>> as easy as possible. Every new knob has the potential to confuse > users > >>>>> and/or allows him to shoot himself in the foot. Consequently, I am +1 > >>> for > >>>>> the first proposal in the FLIP since it is simpler. > >>>>> > >>>>> Also +1 for Stephan's proposal to combine batch operator's and > >>>>> RocksDB's memory usage into one weight. > >>>>> > >>>>> Concerning the names for the two weights, I fear that we are facing > >> one > >>> of > >>>>> the two hard things in computer science. To add another idea, we > could > >>>>> name > >>>>> them "Flink memory"/"Internal memory" and "Python memory". > >>>>> > >>>>> For the sake of making the scope of the FLIP as small as possible and > >> to > >>>>> develop the feature incrementally, I think that Aljoscha's proposal > to > >>>>> make > >>>>> it non-configurable for the first step sounds like a good idea. As a > >>> next > >>>>> step (and also if we see need), we can make the memory weights > >>>>> configurable > >>>>> via the configuration. And last, we could expose it via the > >>>>> ExecutionConfig > >>>>> if it is required. > >>>>> > >>>>> Cheers, > >>>>> Till > >>>>> > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <aljos...@apache.org > > > >>>>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> playing devils advocate here: should we even make the memory weights > >>>>>> configurable? We could go with weights that should make sense for > >> most > >>>>>> cases in the first version and only introduce configurable weights > >>> when > >>>>>> (if) users need them. > >>>>>> > >>>>>> Regarding where/how things are configured, I think that most things > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This > >> makes > >>>>>> them configurable via flink-conf.yaml and via command line > >> parameters, > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think > >> about > >>>>>> offering programmatic API for cases where it makes sense, of course. > >>>>>> > >>>>>> Regarding naming one of the configurable weights > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be > >> that > >>>>>> specific because the option will not age well. For example when we > >>> want > >>>>>> to change which group of memory consumers are configured together or > >>>>>> when we add something new. > >>>>>> > >>>>>> Best, > >>>>>> Aljoscha > >>>>>> > >>>>>> On 31.08.20 08:13, Xintong Song wrote: > >>>>>>> Thanks for the feedbacks, @Stephan > >>>>>>> > >>>>>>> > >>>>>>> - There is a push to make as much as possible configurable via > >>> the > >>>>>> main > >>>>>>>> configuration, and not only in code. Specifically values for > >>>>> operations > >>>>>> and > >>>>>>>> tuning. > >>>>>>>> I think it would be more important to have such memory > >> weights > >>>>> in > >>>>>> the > >>>>>>>> config, compared to in the program API. /cc Aljoscha > >>>>>>> > >>>>>>> > >>>>>>> I can see the benefit that having memory weights in the main > >>>>>> configuration > >>>>>>> makes tuning easier, which makes great sense to me. On the other > >>> hand, > >>>>>> what > >>>>>>> we lose is the flexibility to have different weights for jobs > >>> running > >>>>> in > >>>>>>> the same Flink cluster. It seems to me the problem is that we > >> don't > >>>>> have > >>>>>> an > >>>>>>> easy way to overwrite job-specific configurations without touching > >>> the > >>>>>>> codes. > >>>>>>> > >>>>>>> > >>>>>>> Given the current status, what if we make the memory weights > >>>>> configurable > >>>>>>> through both the main configuration and the programming API? The > >>> main > >>>>>>> configuration should take effect iff the weights are not > >> explicitly > >>>>>>> specified through the programming API. In this way, job cluster > >>> users > >>>>> can > >>>>>>> easily tune the weight through the main configuration, while > >> session > >>>>>>> cluster users, if they want to have different weights for jobs, > >> can > >>>>> still > >>>>>>> overwrite the weight through execution configs. > >>>>>>> > >>>>>>> > >>>>>>> - My recommendation would be to keep this as simple as > >> possible. > >>>>> This > >>>>>>>> will make a lot of configuration code harder, and make it harder > >>> for > >>>>>> users > >>>>>>>> to understand Flink's memory model. > >>>>>>>> Making things as easy for users to understand is very > >>> important > >>>>> in > >>>>>> my > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems > >> better > >>>>> than > >>>>>>>> the alternative proposal listed at the end of the FLIP page. > >>>>>>> > >>>>>>> +1 from my side. > >>>>>>> > >>>>>>> > >>>>>>> - For the simplicity, we could go even further and simply have > >>> two > >>>>>> memory > >>>>>>>> users at the moment: The operator algorithm/data-structure and > >> the > >>>>>> external > >>>>>>>> language process (Python for now). > >>>>>>>> We never have batch algos and RocksDB mixed, having this as > >>>>>> separate > >>>>>>>> options is confusing as it suggests this can be combined > >>>>> arbitrarily. I > >>>>>>>> also think that a slim possibility that we may ever combine this > >> in > >>>>> the > >>>>>>>> future is not enough reason to make it more complex/confusing. > >>>>>>> > >>>>>>> > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're > >>> never > >>>>>> mixed > >>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm" > >> for > >>>>>>> explicitly. > >>>>>>> > >>>>>>> > >>>>>>> For "external language process", I'm not entirely sure. Future > >>>>> external > >>>>>>> languages are possibly mixed with python processes. To avoid later > >>>>>>> considering how to share external language memory across different > >>>>>>> languages, I would suggest to present the concept as "python > >> memory" > >>>>>> rather > >>>>>>> than "external language process memory". > >>>>>>> > >>>>>>> > >>>>>>> Thank you~ > >>>>>>> > >>>>>>> Xintong Song > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org> > >>>>> wrote: > >>>>>>> > >>>>>>>> Thanks for driving this proposal. A few thoughts on the current > >>>>> design: > >>>>>>>> > >>>>>>>> - There is a push to make as much as possible configurable via > >>> the > >>>>>> main > >>>>>>>> configuration, and not only in code. Specifically values for > >>>>> operations > >>>>>> and > >>>>>>>> tuning. > >>>>>>>> I think it would be more important to have such memory > >> weights > >>>>> in > >>>>>> the > >>>>>>>> config, compared to in the program API. /cc Aljoscha > >>>>>>>> > >>>>>>>> - My recommendation would be to keep this as simple as > >> possible. > >>>>> This > >>>>>>>> will make a lot of configuration code harder, and make it harder > >>> for > >>>>>> users > >>>>>>>> to understand Flink's memory model. > >>>>>>>> Making things as easy for users to understand is very > >>> important > >>>>> in > >>>>>> my > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems > >> better > >>>>> than > >>>>>>>> the alternative proposal listed at the end of the FLIP page. > >>>>>>>> > >>>>>>>> - For the simplicity, we could go even further and simply have > >>> two > >>>>>> memory > >>>>>>>> users at the moment: The operator algorithm/data-structure and > >> the > >>>>>> external > >>>>>>>> language process (Python for now). > >>>>>>>> We never have batch algos and RocksDB mixed, having this as > >>>>>> separate > >>>>>>>> options is confusing as it suggests this can be combined > >>>>> arbitrarily. I > >>>>>>>> also think that a slim possibility that we may ever combine this > >> in > >>>>> the > >>>>>>>> future is not enough reason to make it more complex/confusing. > >>>>>>>> > >>>>>>>> - I am also not aware of any plans to combine the network and > >>>>>> operator > >>>>>>>> memory. Not that it would be infeasible to do this, but I think > >>> this > >>>>>> would > >>>>>>>> also be orthogonal to this change, and I am not sure this would > >> be > >>>>>> solved > >>>>>>>> with static weights. So trying to get network memory into this > >>>>> proposal > >>>>>>>> seems pre-mature to me. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Stephan > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song < > >>> tonysong...@gmail.com > >>>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> A quick question, does network memory treated as managed memory > >>>>> now? > >>>>>> Or > >>>>>>>>> in > >>>>>>>>>> the future? > >>>>>>>>>> > >>>>>>>>> No, network memory is independent from managed memory ATM. And > >> I'm > >>>>> not > >>>>>>>>> aware of any plan to combine these two. > >>>>>>>>> > >>>>>>>>> Any insights there? > >>>>>>>>> > >>>>>>>>> Thank you~ > >>>>>>>>> > >>>>>>>>> Xintong Song > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <ykt...@gmail.com> > >>>>> wrote: > >>>>>>>>> > >>>>>>>>>> A quick question, does network memory treated as managed memory > >>>>> now? > >>>>>> Or > >>>>>>>>> in > >>>>>>>>>> the future? > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Kurt > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song < > >>>>> tonysong...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi devs, > >>>>>>>>>>> > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which > >>> proposes > >>>>> how > >>>>>>>>>>> managed memory should be shared by various use cases within a > >>>>> slot. > >>>>>>>>> This > >>>>>>>>>> is > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB > >> state > >>>>>>>> backend > >>>>>>>>>> and > >>>>>>>>>>> batch operators are the only use cases of managed memory for > >>>>>>>> streaming > >>>>>>>>>> and > >>>>>>>>>>> batch jobs respectively, which is no longer true with the > >>>>>>>> introduction > >>>>>>>>> of > >>>>>>>>>>> Python UDFs. > >>>>>>>>>>> > >>>>>>>>>>> Please notice that we have not reached consensus between two > >>>>>>>> different > >>>>>>>>>>> designs. The major part of this FLIP describes one of the > >>>>> candidates, > >>>>>>>>>> while > >>>>>>>>>>> the alternative is discussed in the section "Rejected > >>>>> Alternatives". > >>>>>>>> We > >>>>>>>>>> are > >>>>>>>>>>> hoping to borrow intelligence from the community to help us > >>>>> resolve > >>>>>>>> the > >>>>>>>>>>> disagreement. > >>>>>>>>>>> > >>>>>>>>>>> Any feedback would be appreciated. > >>>>>>>>>>> > >>>>>>>>>>> Thank you~ > >>>>>>>>>>> > >>>>>>>>>>> Xintong Song > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility > >>>>>>>>>>> > >>>>>>>>>>> [2] > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >