Thanks Till, `taskmanager.memory.managed.consumer-weights` sounds good to me.
Thank you~ Xintong Song On Thu, Sep 3, 2020 at 8:44 PM Till Rohrmann <trohrm...@apache.org> wrote: > Thanks for updating the FLIP Xintong. It looks good to me. One minor > comment is that we could name the configuration parameter > also taskmanager.memory.managed.consumer-weights which might be a bit more > expressive what this option does. > > Cheers, > Till > > On Thu, Sep 3, 2020 at 12:44 PM Xintong Song <tonysong...@gmail.com> > wrote: > > > Thanks all for the feedback. > > > > FYI, I've opened a voting thread[1] on this. > > > > Thank you~ > > > > Xintong Song > > > > > > [1] > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html > > > > > > On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <reed...@gmail.com> wrote: > > > > > Thanks for proposing this improvement! @Xintong > > > The proposal looks good to me. Agreed that we should make it as simple > as > > > possible for users to understand. > > > > > > Thanks, > > > Zhu > > > > > > Dian Fu <dian0511...@gmail.com> 于2020年9月3日周四 下午2:11写道: > > > > > > > Thanks for driving this FLIP, Xintong! +1 to the updated version. > > > > > > > > > 在 2020年9月2日,下午6:09,Xintong Song <tonysong...@gmail.com> 写道: > > > > > > > > > > Thanks for the input, Yu. > > > > > > > > > > I believe the current proposal should work with RocksDB, or any > other > > > > state > > > > > backend, using memory at either the slot or the scope. With the > > > proposed > > > > > approach, all we need is an indicator (e.g., a configuration > option) > > > > > telling us which scope should we calculate the fractions for. > > > > > > > > > > Thank you~ > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <car...@gmail.com> wrote: > > > > > > > > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc. > > > > >> > > > > >> Just one supplement for the RocksDB state backend part: > > > > >> > > > > >> It's true that currently we're using managed memory at the slot > > scope. > > > > >> However, IMHO, we may support setting weights for different > stateful > > > > >> operators (for advanced usage) in future. For example, users may > > > choose > > > > to > > > > >> set higher weights for join operator over aggregation operator, to > > > give > > > > >> more memory to those with bigger states. In this case, we may also > > use > > > > >> managed memory at the operator scope for state backends. And if I > > > > >> understand correctly, the current design could cover this case > well. > > > > >> > > > > >> Best Regards, > > > > >> Yu > > > > >> > > > > >> > > > > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <tonysong...@gmail.com> > > > > wrote: > > > > >> > > > > >>> Thanks all for the feedback and discussion. > > > > >>> > > > > >>> I have updated the FLIP, with the following changes. > > > > >>> > > > > >>> - Choose the main proposal over the alternative approach > > > > >>> - Combine weights of RocksDB and batch operators > > > > >>> - Expose weights through configuration options, rather than via > > > > >>> ExecutionConfig. > > > > >>> - Add implementation plan. > > > > >>> > > > > >>> Please help take another look. > > > > >>> > > > > >>> Thank you~ > > > > >>> > > > > >>> Xintong Song > > > > >>> > > > > >>> > > > > >>> > > > > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song < > tonysong...@gmail.com > > > > > > > >> wrote: > > > > >>> > > > > >>>> Thanks for the inputs, Aljoscha & Till. > > > > >>>> > > > > >>>> > > > > >>>> # Weight Configuration > > > > >>>> > > > > >>>> > > > > >>>> I think exposing the knobs incrementally is a good idea. > However, > > > I'm > > > > >> not > > > > >>>> sure about non-configurable as the first step. > > > > >>>> > > > > >>>> > > > > >>>> Currently, users can tune memory for rocksdb > > > > >>>> ('taskmanager.memory.managed.size') and python > > > > >>>> ('python.fn-execution.[framework|buffer].memory.size') > separately, > > > > >> which > > > > >>>> practically means any combination of rocksdb and python memory > > > sizes. > > > > >> If > > > > >>> we > > > > >>>> switch to non-configurable weights, that will be a regression > > > compared > > > > >> to > > > > >>>> 1.11. > > > > >>>> > > > > >>>> > > > > >>>> Therefore, I think exposing via configuration options might be a > > > good > > > > >>>> first step. And we can discuss exposing via ExecutionConfig if > > later > > > > we > > > > >>> see > > > > >>>> that requirement. > > > > >>>> > > > > >>>> > > > > >>>> # Naming of Weights > > > > >>>> > > > > >>>> > > > > >>>> I'm neutral for "Flink/Internal memory". > > > > >>>> > > > > >>>> > > > > >>>> I think the reason we can combine weights for batch algorithms > and > > > > >> state > > > > >>>> backends is that they are never mixed together. My only concern > > > > >>>> for "Flink/Internal memory", which might not be a problem at the > > > > >> moment, > > > > >>> is > > > > >>>> that what if new memory use cases appear in the future, which > can > > > also > > > > >> be > > > > >>>> described by "Flink/Internal memory" but is not guaranteed not > > mixed > > > > >> with > > > > >>>> batch algorithms or state backends? > > > > >>>> > > > > >>>> > > > > >>>> Anyway, I think the naming should not block this FLIP, as long > as > > we > > > > >> have > > > > >>>> consensus on combining the two weights for rocksdb and batch > > > > >> algorithms. > > > > >>> We > > > > >>>> can keep the naming discussion open until the implementation > > phase. > > > > >>>> > > > > >>>> > > > > >>>> Thank you~ > > > > >>>> > > > > >>>> Xintong Song > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann < > > trohrm...@apache.org > > > > > > > > >>>> wrote: > > > > >>>> > > > > >>>>> Thanks for creating this FLIP Xintong. > > > > >>>>> > > > > >>>>> I agree with the previous comments that the memory > configuration > > > > >> should > > > > >>> be > > > > >>>>> as easy as possible. Every new knob has the potential to > confuse > > > > users > > > > >>>>> and/or allows him to shoot himself in the foot. Consequently, I > > am > > > +1 > > > > >>> for > > > > >>>>> the first proposal in the FLIP since it is simpler. > > > > >>>>> > > > > >>>>> Also +1 for Stephan's proposal to combine batch operator's and > > > > >>>>> RocksDB's memory usage into one weight. > > > > >>>>> > > > > >>>>> Concerning the names for the two weights, I fear that we are > > facing > > > > >> one > > > > >>> of > > > > >>>>> the two hard things in computer science. To add another idea, > we > > > > could > > > > >>>>> name > > > > >>>>> them "Flink memory"/"Internal memory" and "Python memory". > > > > >>>>> > > > > >>>>> For the sake of making the scope of the FLIP as small as > possible > > > and > > > > >> to > > > > >>>>> develop the feature incrementally, I think that Aljoscha's > > proposal > > > > to > > > > >>>>> make > > > > >>>>> it non-configurable for the first step sounds like a good idea. > > As > > > a > > > > >>> next > > > > >>>>> step (and also if we see need), we can make the memory weights > > > > >>>>> configurable > > > > >>>>> via the configuration. And last, we could expose it via the > > > > >>>>> ExecutionConfig > > > > >>>>> if it is required. > > > > >>>>> > > > > >>>>> Cheers, > > > > >>>>> Till > > > > >>>>> > > > > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek < > > > aljos...@apache.org > > > > > > > > > >>>>> wrote: > > > > >>>>> > > > > >>>>>> Hi, > > > > >>>>>> > > > > >>>>>> playing devils advocate here: should we even make the memory > > > weights > > > > >>>>>> configurable? We could go with weights that should make sense > > for > > > > >> most > > > > >>>>>> cases in the first version and only introduce configurable > > weights > > > > >>> when > > > > >>>>>> (if) users need them. > > > > >>>>>> > > > > >>>>>> Regarding where/how things are configured, I think that most > > > things > > > > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). > This > > > > >> makes > > > > >>>>>> them configurable via flink-conf.yaml and via command line > > > > >> parameters, > > > > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can > think > > > > >> about > > > > >>>>>> offering programmatic API for cases where it makes sense, of > > > course. > > > > >>>>>> > > > > >>>>>> Regarding naming one of the configurable weights > > > > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to > > be > > > > >> that > > > > >>>>>> specific because the option will not age well. For example > when > > we > > > > >>> want > > > > >>>>>> to change which group of memory consumers are configured > > together > > > or > > > > >>>>>> when we add something new. > > > > >>>>>> > > > > >>>>>> Best, > > > > >>>>>> Aljoscha > > > > >>>>>> > > > > >>>>>> On 31.08.20 08:13, Xintong Song wrote: > > > > >>>>>>> Thanks for the feedbacks, @Stephan > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> - There is a push to make as much as possible configurable > > via > > > > >>> the > > > > >>>>>> main > > > > >>>>>>>> configuration, and not only in code. Specifically values for > > > > >>>>> operations > > > > >>>>>> and > > > > >>>>>>>> tuning. > > > > >>>>>>>> I think it would be more important to have such memory > > > > >> weights > > > > >>>>> in > > > > >>>>>> the > > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> I can see the benefit that having memory weights in the main > > > > >>>>>> configuration > > > > >>>>>>> makes tuning easier, which makes great sense to me. On the > > other > > > > >>> hand, > > > > >>>>>> what > > > > >>>>>>> we lose is the flexibility to have different weights for jobs > > > > >>> running > > > > >>>>> in > > > > >>>>>>> the same Flink cluster. It seems to me the problem is that we > > > > >> don't > > > > >>>>> have > > > > >>>>>> an > > > > >>>>>>> easy way to overwrite job-specific configurations without > > > touching > > > > >>> the > > > > >>>>>>> codes. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> Given the current status, what if we make the memory weights > > > > >>>>> configurable > > > > >>>>>>> through both the main configuration and the programming API? > > The > > > > >>> main > > > > >>>>>>> configuration should take effect iff the weights are not > > > > >> explicitly > > > > >>>>>>> specified through the programming API. In this way, job > cluster > > > > >>> users > > > > >>>>> can > > > > >>>>>>> easily tune the weight through the main configuration, while > > > > >> session > > > > >>>>>>> cluster users, if they want to have different weights for > jobs, > > > > >> can > > > > >>>>> still > > > > >>>>>>> overwrite the weight through execution configs. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> - My recommendation would be to keep this as simple as > > > > >> possible. > > > > >>>>> This > > > > >>>>>>>> will make a lot of configuration code harder, and make it > > harder > > > > >>> for > > > > >>>>>> users > > > > >>>>>>>> to understand Flink's memory model. > > > > >>>>>>>> Making things as easy for users to understand is very > > > > >>> important > > > > >>>>> in > > > > >>>>>> my > > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems > > > > >> better > > > > >>>>> than > > > > >>>>>>>> the alternative proposal listed at the end of the FLIP page. > > > > >>>>>>> > > > > >>>>>>> +1 from my side. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> - For the simplicity, we could go even further and simply > > have > > > > >>> two > > > > >>>>>> memory > > > > >>>>>>>> users at the moment: The operator algorithm/data-structure > and > > > > >> the > > > > >>>>>> external > > > > >>>>>>>> language process (Python for now). > > > > >>>>>>>> We never have batch algos and RocksDB mixed, having this > > as > > > > >>>>>> separate > > > > >>>>>>>> options is confusing as it suggests this can be combined > > > > >>>>> arbitrarily. I > > > > >>>>>>>> also think that a slim possibility that we may ever combine > > this > > > > >> in > > > > >>>>> the > > > > >>>>>>>> future is not enough reason to make it more > complex/confusing. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for > they're > > > > >>> never > > > > >>>>>> mixed > > > > >>>>>>> together. We can even just name it > > "StateBackend-BatchAlgorithm" > > > > >> for > > > > >>>>>>> explicitly. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> For "external language process", I'm not entirely sure. > Future > > > > >>>>> external > > > > >>>>>>> languages are possibly mixed with python processes. To avoid > > > later > > > > >>>>>>> considering how to share external language memory across > > > different > > > > >>>>>>> languages, I would suggest to present the concept as "python > > > > >> memory" > > > > >>>>>> rather > > > > >>>>>>> than "external language process memory". > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> Thank you~ > > > > >>>>>>> > > > > >>>>>>> Xintong Song > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen < > > se...@apache.org> > > > > >>>>> wrote: > > > > >>>>>>> > > > > >>>>>>>> Thanks for driving this proposal. A few thoughts on the > > current > > > > >>>>> design: > > > > >>>>>>>> > > > > >>>>>>>> - There is a push to make as much as possible configurable > > via > > > > >>> the > > > > >>>>>> main > > > > >>>>>>>> configuration, and not only in code. Specifically values for > > > > >>>>> operations > > > > >>>>>> and > > > > >>>>>>>> tuning. > > > > >>>>>>>> I think it would be more important to have such memory > > > > >> weights > > > > >>>>> in > > > > >>>>>> the > > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha > > > > >>>>>>>> > > > > >>>>>>>> - My recommendation would be to keep this as simple as > > > > >> possible. > > > > >>>>> This > > > > >>>>>>>> will make a lot of configuration code harder, and make it > > harder > > > > >>> for > > > > >>>>>> users > > > > >>>>>>>> to understand Flink's memory model. > > > > >>>>>>>> Making things as easy for users to understand is very > > > > >>> important > > > > >>>>> in > > > > >>>>>> my > > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems > > > > >> better > > > > >>>>> than > > > > >>>>>>>> the alternative proposal listed at the end of the FLIP page. > > > > >>>>>>>> > > > > >>>>>>>> - For the simplicity, we could go even further and simply > > have > > > > >>> two > > > > >>>>>> memory > > > > >>>>>>>> users at the moment: The operator algorithm/data-structure > and > > > > >> the > > > > >>>>>> external > > > > >>>>>>>> language process (Python for now). > > > > >>>>>>>> We never have batch algos and RocksDB mixed, having this > > as > > > > >>>>>> separate > > > > >>>>>>>> options is confusing as it suggests this can be combined > > > > >>>>> arbitrarily. I > > > > >>>>>>>> also think that a slim possibility that we may ever combine > > this > > > > >> in > > > > >>>>> the > > > > >>>>>>>> future is not enough reason to make it more > complex/confusing. > > > > >>>>>>>> > > > > >>>>>>>> - I am also not aware of any plans to combine the network > > and > > > > >>>>>> operator > > > > >>>>>>>> memory. Not that it would be infeasible to do this, but I > > think > > > > >>> this > > > > >>>>>> would > > > > >>>>>>>> also be orthogonal to this change, and I am not sure this > > would > > > > >> be > > > > >>>>>> solved > > > > >>>>>>>> with static weights. So trying to get network memory into > this > > > > >>>>> proposal > > > > >>>>>>>> seems pre-mature to me. > > > > >>>>>>>> > > > > >>>>>>>> Best, > > > > >>>>>>>> Stephan > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song < > > > > >>> tonysong...@gmail.com > > > > >>>>>> > > > > >>>>>>>> wrote: > > > > >>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> A quick question, does network memory treated as managed > > > memory > > > > >>>>> now? > > > > >>>>>> Or > > > > >>>>>>>>> in > > > > >>>>>>>>>> the future? > > > > >>>>>>>>>> > > > > >>>>>>>>> No, network memory is independent from managed memory ATM. > > And > > > > >> I'm > > > > >>>>> not > > > > >>>>>>>>> aware of any plan to combine these two. > > > > >>>>>>>>> > > > > >>>>>>>>> Any insights there? > > > > >>>>>>>>> > > > > >>>>>>>>> Thank you~ > > > > >>>>>>>>> > > > > >>>>>>>>> Xintong Song > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young < > ykt...@gmail.com > > > > > > > >>>>> wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> A quick question, does network memory treated as managed > > > memory > > > > >>>>> now? > > > > >>>>>> Or > > > > >>>>>>>>> in > > > > >>>>>>>>>> the future? > > > > >>>>>>>>>> > > > > >>>>>>>>>> Best, > > > > >>>>>>>>>> Kurt > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song < > > > > >>>>> tonysong...@gmail.com> > > > > >>>>>>>>>> wrote: > > > > >>>>>>>>>> > > > > >>>>>>>>>>> Hi devs, > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which > > > > >>> proposes > > > > >>>>> how > > > > >>>>>>>>>>> managed memory should be shared by various use cases > > within a > > > > >>>>> slot. > > > > >>>>>>>>> This > > > > >>>>>>>>>> is > > > > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB > > > > >> state > > > > >>>>>>>> backend > > > > >>>>>>>>>> and > > > > >>>>>>>>>>> batch operators are the only use cases of managed memory > > for > > > > >>>>>>>> streaming > > > > >>>>>>>>>> and > > > > >>>>>>>>>>> batch jobs respectively, which is no longer true with the > > > > >>>>>>>> introduction > > > > >>>>>>>>> of > > > > >>>>>>>>>>> Python UDFs. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Please notice that we have not reached consensus between > > two > > > > >>>>>>>> different > > > > >>>>>>>>>>> designs. The major part of this FLIP describes one of the > > > > >>>>> candidates, > > > > >>>>>>>>>> while > > > > >>>>>>>>>>> the alternative is discussed in the section "Rejected > > > > >>>>> Alternatives". > > > > >>>>>>>> We > > > > >>>>>>>>>> are > > > > >>>>>>>>>>> hoping to borrow intelligence from the community to help > us > > > > >>>>> resolve > > > > >>>>>>>> the > > > > >>>>>>>>>>> disagreement. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Any feedback would be appreciated. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Thank you~ > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Xintong Song > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> [1] > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> [2] > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > > >