Thanks all for the feedback. FYI, I've opened a voting thread[1] on this.
Thank you~ Xintong Song [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <reed...@gmail.com> wrote: > Thanks for proposing this improvement! @Xintong > The proposal looks good to me. Agreed that we should make it as simple as > possible for users to understand. > > Thanks, > Zhu > > Dian Fu <dian0511...@gmail.com> 于2020年9月3日周四 下午2:11写道: > > > Thanks for driving this FLIP, Xintong! +1 to the updated version. > > > > > 在 2020年9月2日,下午6:09,Xintong Song <tonysong...@gmail.com> 写道: > > > > > > Thanks for the input, Yu. > > > > > > I believe the current proposal should work with RocksDB, or any other > > state > > > backend, using memory at either the slot or the scope. With the > proposed > > > approach, all we need is an indicator (e.g., a configuration option) > > > telling us which scope should we calculate the fractions for. > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <car...@gmail.com> wrote: > > > > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc. > > >> > > >> Just one supplement for the RocksDB state backend part: > > >> > > >> It's true that currently we're using managed memory at the slot scope. > > >> However, IMHO, we may support setting weights for different stateful > > >> operators (for advanced usage) in future. For example, users may > choose > > to > > >> set higher weights for join operator over aggregation operator, to > give > > >> more memory to those with bigger states. In this case, we may also use > > >> managed memory at the operator scope for state backends. And if I > > >> understand correctly, the current design could cover this case well. > > >> > > >> Best Regards, > > >> Yu > > >> > > >> > > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <tonysong...@gmail.com> > > wrote: > > >> > > >>> Thanks all for the feedback and discussion. > > >>> > > >>> I have updated the FLIP, with the following changes. > > >>> > > >>> - Choose the main proposal over the alternative approach > > >>> - Combine weights of RocksDB and batch operators > > >>> - Expose weights through configuration options, rather than via > > >>> ExecutionConfig. > > >>> - Add implementation plan. > > >>> > > >>> Please help take another look. > > >>> > > >>> Thank you~ > > >>> > > >>> Xintong Song > > >>> > > >>> > > >>> > > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <tonysong...@gmail.com> > > >> wrote: > > >>> > > >>>> Thanks for the inputs, Aljoscha & Till. > > >>>> > > >>>> > > >>>> # Weight Configuration > > >>>> > > >>>> > > >>>> I think exposing the knobs incrementally is a good idea. However, > I'm > > >> not > > >>>> sure about non-configurable as the first step. > > >>>> > > >>>> > > >>>> Currently, users can tune memory for rocksdb > > >>>> ('taskmanager.memory.managed.size') and python > > >>>> ('python.fn-execution.[framework|buffer].memory.size') separately, > > >> which > > >>>> practically means any combination of rocksdb and python memory > sizes. > > >> If > > >>> we > > >>>> switch to non-configurable weights, that will be a regression > compared > > >> to > > >>>> 1.11. > > >>>> > > >>>> > > >>>> Therefore, I think exposing via configuration options might be a > good > > >>>> first step. And we can discuss exposing via ExecutionConfig if later > > we > > >>> see > > >>>> that requirement. > > >>>> > > >>>> > > >>>> # Naming of Weights > > >>>> > > >>>> > > >>>> I'm neutral for "Flink/Internal memory". > > >>>> > > >>>> > > >>>> I think the reason we can combine weights for batch algorithms and > > >> state > > >>>> backends is that they are never mixed together. My only concern > > >>>> for "Flink/Internal memory", which might not be a problem at the > > >> moment, > > >>> is > > >>>> that what if new memory use cases appear in the future, which can > also > > >> be > > >>>> described by "Flink/Internal memory" but is not guaranteed not mixed > > >> with > > >>>> batch algorithms or state backends? > > >>>> > > >>>> > > >>>> Anyway, I think the naming should not block this FLIP, as long as we > > >> have > > >>>> consensus on combining the two weights for rocksdb and batch > > >> algorithms. > > >>> We > > >>>> can keep the naming discussion open until the implementation phase. > > >>>> > > >>>> > > >>>> Thank you~ > > >>>> > > >>>> Xintong Song > > >>>> > > >>>> > > >>>> > > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <trohrm...@apache.org > > > > >>>> wrote: > > >>>> > > >>>>> Thanks for creating this FLIP Xintong. > > >>>>> > > >>>>> I agree with the previous comments that the memory configuration > > >> should > > >>> be > > >>>>> as easy as possible. Every new knob has the potential to confuse > > users > > >>>>> and/or allows him to shoot himself in the foot. Consequently, I am > +1 > > >>> for > > >>>>> the first proposal in the FLIP since it is simpler. > > >>>>> > > >>>>> Also +1 for Stephan's proposal to combine batch operator's and > > >>>>> RocksDB's memory usage into one weight. > > >>>>> > > >>>>> Concerning the names for the two weights, I fear that we are facing > > >> one > > >>> of > > >>>>> the two hard things in computer science. To add another idea, we > > could > > >>>>> name > > >>>>> them "Flink memory"/"Internal memory" and "Python memory". > > >>>>> > > >>>>> For the sake of making the scope of the FLIP as small as possible > and > > >> to > > >>>>> develop the feature incrementally, I think that Aljoscha's proposal > > to > > >>>>> make > > >>>>> it non-configurable for the first step sounds like a good idea. As > a > > >>> next > > >>>>> step (and also if we see need), we can make the memory weights > > >>>>> configurable > > >>>>> via the configuration. And last, we could expose it via the > > >>>>> ExecutionConfig > > >>>>> if it is required. > > >>>>> > > >>>>> Cheers, > > >>>>> Till > > >>>>> > > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek < > aljos...@apache.org > > > > > >>>>> wrote: > > >>>>> > > >>>>>> Hi, > > >>>>>> > > >>>>>> playing devils advocate here: should we even make the memory > weights > > >>>>>> configurable? We could go with weights that should make sense for > > >> most > > >>>>>> cases in the first version and only introduce configurable weights > > >>> when > > >>>>>> (if) users need them. > > >>>>>> > > >>>>>> Regarding where/how things are configured, I think that most > things > > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This > > >> makes > > >>>>>> them configurable via flink-conf.yaml and via command line > > >> parameters, > > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think > > >> about > > >>>>>> offering programmatic API for cases where it makes sense, of > course. > > >>>>>> > > >>>>>> Regarding naming one of the configurable weights > > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be > > >> that > > >>>>>> specific because the option will not age well. For example when we > > >>> want > > >>>>>> to change which group of memory consumers are configured together > or > > >>>>>> when we add something new. > > >>>>>> > > >>>>>> Best, > > >>>>>> Aljoscha > > >>>>>> > > >>>>>> On 31.08.20 08:13, Xintong Song wrote: > > >>>>>>> Thanks for the feedbacks, @Stephan > > >>>>>>> > > >>>>>>> > > >>>>>>> - There is a push to make as much as possible configurable via > > >>> the > > >>>>>> main > > >>>>>>>> configuration, and not only in code. Specifically values for > > >>>>> operations > > >>>>>> and > > >>>>>>>> tuning. > > >>>>>>>> I think it would be more important to have such memory > > >> weights > > >>>>> in > > >>>>>> the > > >>>>>>>> config, compared to in the program API. /cc Aljoscha > > >>>>>>> > > >>>>>>> > > >>>>>>> I can see the benefit that having memory weights in the main > > >>>>>> configuration > > >>>>>>> makes tuning easier, which makes great sense to me. On the other > > >>> hand, > > >>>>>> what > > >>>>>>> we lose is the flexibility to have different weights for jobs > > >>> running > > >>>>> in > > >>>>>>> the same Flink cluster. It seems to me the problem is that we > > >> don't > > >>>>> have > > >>>>>> an > > >>>>>>> easy way to overwrite job-specific configurations without > touching > > >>> the > > >>>>>>> codes. > > >>>>>>> > > >>>>>>> > > >>>>>>> Given the current status, what if we make the memory weights > > >>>>> configurable > > >>>>>>> through both the main configuration and the programming API? The > > >>> main > > >>>>>>> configuration should take effect iff the weights are not > > >> explicitly > > >>>>>>> specified through the programming API. In this way, job cluster > > >>> users > > >>>>> can > > >>>>>>> easily tune the weight through the main configuration, while > > >> session > > >>>>>>> cluster users, if they want to have different weights for jobs, > > >> can > > >>>>> still > > >>>>>>> overwrite the weight through execution configs. > > >>>>>>> > > >>>>>>> > > >>>>>>> - My recommendation would be to keep this as simple as > > >> possible. > > >>>>> This > > >>>>>>>> will make a lot of configuration code harder, and make it harder > > >>> for > > >>>>>> users > > >>>>>>>> to understand Flink's memory model. > > >>>>>>>> Making things as easy for users to understand is very > > >>> important > > >>>>> in > > >>>>>> my > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems > > >> better > > >>>>> than > > >>>>>>>> the alternative proposal listed at the end of the FLIP page. > > >>>>>>> > > >>>>>>> +1 from my side. > > >>>>>>> > > >>>>>>> > > >>>>>>> - For the simplicity, we could go even further and simply have > > >>> two > > >>>>>> memory > > >>>>>>>> users at the moment: The operator algorithm/data-structure and > > >> the > > >>>>>> external > > >>>>>>>> language process (Python for now). > > >>>>>>>> We never have batch algos and RocksDB mixed, having this as > > >>>>>> separate > > >>>>>>>> options is confusing as it suggests this can be combined > > >>>>> arbitrarily. I > > >>>>>>>> also think that a slim possibility that we may ever combine this > > >> in > > >>>>> the > > >>>>>>>> future is not enough reason to make it more complex/confusing. > > >>>>>>> > > >>>>>>> > > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're > > >>> never > > >>>>>> mixed > > >>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm" > > >> for > > >>>>>>> explicitly. > > >>>>>>> > > >>>>>>> > > >>>>>>> For "external language process", I'm not entirely sure. Future > > >>>>> external > > >>>>>>> languages are possibly mixed with python processes. To avoid > later > > >>>>>>> considering how to share external language memory across > different > > >>>>>>> languages, I would suggest to present the concept as "python > > >> memory" > > >>>>>> rather > > >>>>>>> than "external language process memory". > > >>>>>>> > > >>>>>>> > > >>>>>>> Thank you~ > > >>>>>>> > > >>>>>>> Xintong Song > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org> > > >>>>> wrote: > > >>>>>>> > > >>>>>>>> Thanks for driving this proposal. A few thoughts on the current > > >>>>> design: > > >>>>>>>> > > >>>>>>>> - There is a push to make as much as possible configurable via > > >>> the > > >>>>>> main > > >>>>>>>> configuration, and not only in code. Specifically values for > > >>>>> operations > > >>>>>> and > > >>>>>>>> tuning. > > >>>>>>>> I think it would be more important to have such memory > > >> weights > > >>>>> in > > >>>>>> the > > >>>>>>>> config, compared to in the program API. /cc Aljoscha > > >>>>>>>> > > >>>>>>>> - My recommendation would be to keep this as simple as > > >> possible. > > >>>>> This > > >>>>>>>> will make a lot of configuration code harder, and make it harder > > >>> for > > >>>>>> users > > >>>>>>>> to understand Flink's memory model. > > >>>>>>>> Making things as easy for users to understand is very > > >>> important > > >>>>> in > > >>>>>> my > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems > > >> better > > >>>>> than > > >>>>>>>> the alternative proposal listed at the end of the FLIP page. > > >>>>>>>> > > >>>>>>>> - For the simplicity, we could go even further and simply have > > >>> two > > >>>>>> memory > > >>>>>>>> users at the moment: The operator algorithm/data-structure and > > >> the > > >>>>>> external > > >>>>>>>> language process (Python for now). > > >>>>>>>> We never have batch algos and RocksDB mixed, having this as > > >>>>>> separate > > >>>>>>>> options is confusing as it suggests this can be combined > > >>>>> arbitrarily. I > > >>>>>>>> also think that a slim possibility that we may ever combine this > > >> in > > >>>>> the > > >>>>>>>> future is not enough reason to make it more complex/confusing. > > >>>>>>>> > > >>>>>>>> - I am also not aware of any plans to combine the network and > > >>>>>> operator > > >>>>>>>> memory. Not that it would be infeasible to do this, but I think > > >>> this > > >>>>>> would > > >>>>>>>> also be orthogonal to this change, and I am not sure this would > > >> be > > >>>>>> solved > > >>>>>>>> with static weights. So trying to get network memory into this > > >>>>> proposal > > >>>>>>>> seems pre-mature to me. > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Stephan > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song < > > >>> tonysong...@gmail.com > > >>>>>> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> A quick question, does network memory treated as managed > memory > > >>>>> now? > > >>>>>> Or > > >>>>>>>>> in > > >>>>>>>>>> the future? > > >>>>>>>>>> > > >>>>>>>>> No, network memory is independent from managed memory ATM. And > > >> I'm > > >>>>> not > > >>>>>>>>> aware of any plan to combine these two. > > >>>>>>>>> > > >>>>>>>>> Any insights there? > > >>>>>>>>> > > >>>>>>>>> Thank you~ > > >>>>>>>>> > > >>>>>>>>> Xintong Song > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <ykt...@gmail.com> > > >>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> A quick question, does network memory treated as managed > memory > > >>>>> now? > > >>>>>> Or > > >>>>>>>>> in > > >>>>>>>>>> the future? > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Kurt > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song < > > >>>>> tonysong...@gmail.com> > > >>>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> Hi devs, > > >>>>>>>>>>> > > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which > > >>> proposes > > >>>>> how > > >>>>>>>>>>> managed memory should be shared by various use cases within a > > >>>>> slot. > > >>>>>>>>> This > > >>>>>>>>>> is > > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB > > >> state > > >>>>>>>> backend > > >>>>>>>>>> and > > >>>>>>>>>>> batch operators are the only use cases of managed memory for > > >>>>>>>> streaming > > >>>>>>>>>> and > > >>>>>>>>>>> batch jobs respectively, which is no longer true with the > > >>>>>>>> introduction > > >>>>>>>>> of > > >>>>>>>>>>> Python UDFs. > > >>>>>>>>>>> > > >>>>>>>>>>> Please notice that we have not reached consensus between two > > >>>>>>>> different > > >>>>>>>>>>> designs. The major part of this FLIP describes one of the > > >>>>> candidates, > > >>>>>>>>>> while > > >>>>>>>>>>> the alternative is discussed in the section "Rejected > > >>>>> Alternatives". > > >>>>>>>> We > > >>>>>>>>>> are > > >>>>>>>>>>> hoping to borrow intelligence from the community to help us > > >>>>> resolve > > >>>>>>>> the > > >>>>>>>>>>> disagreement. > > >>>>>>>>>>> > > >>>>>>>>>>> Any feedback would be appreciated. > > >>>>>>>>>>> > > >>>>>>>>>>> Thank you~ > > >>>>>>>>>>> > > >>>>>>>>>>> Xintong Song > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> [1] > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>> > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility > > >>>>>>>>>>> > > >>>>>>>>>>> [2] > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>> > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > > >