Thanks for updating the FLIP Xintong. It looks good to me. One minor comment is that we could name the configuration parameter also taskmanager.memory.managed.consumer-weights which might be a bit more expressive what this option does.
Cheers, Till On Thu, Sep 3, 2020 at 12:44 PM Xintong Song <tonysong...@gmail.com> wrote: > Thanks all for the feedback. > > FYI, I've opened a voting thread[1] on this. > > Thank you~ > > Xintong Song > > > [1] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html > > > On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <reed...@gmail.com> wrote: > > > Thanks for proposing this improvement! @Xintong > > The proposal looks good to me. Agreed that we should make it as simple as > > possible for users to understand. > > > > Thanks, > > Zhu > > > > Dian Fu <dian0511...@gmail.com> 于2020年9月3日周四 下午2:11写道: > > > > > Thanks for driving this FLIP, Xintong! +1 to the updated version. > > > > > > > 在 2020年9月2日,下午6:09,Xintong Song <tonysong...@gmail.com> 写道: > > > > > > > > Thanks for the input, Yu. > > > > > > > > I believe the current proposal should work with RocksDB, or any other > > > state > > > > backend, using memory at either the slot or the scope. With the > > proposed > > > > approach, all we need is an indicator (e.g., a configuration option) > > > > telling us which scope should we calculate the fractions for. > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <car...@gmail.com> wrote: > > > > > > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc. > > > >> > > > >> Just one supplement for the RocksDB state backend part: > > > >> > > > >> It's true that currently we're using managed memory at the slot > scope. > > > >> However, IMHO, we may support setting weights for different stateful > > > >> operators (for advanced usage) in future. For example, users may > > choose > > > to > > > >> set higher weights for join operator over aggregation operator, to > > give > > > >> more memory to those with bigger states. In this case, we may also > use > > > >> managed memory at the operator scope for state backends. And if I > > > >> understand correctly, the current design could cover this case well. > > > >> > > > >> Best Regards, > > > >> Yu > > > >> > > > >> > > > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <tonysong...@gmail.com> > > > wrote: > > > >> > > > >>> Thanks all for the feedback and discussion. > > > >>> > > > >>> I have updated the FLIP, with the following changes. > > > >>> > > > >>> - Choose the main proposal over the alternative approach > > > >>> - Combine weights of RocksDB and batch operators > > > >>> - Expose weights through configuration options, rather than via > > > >>> ExecutionConfig. > > > >>> - Add implementation plan. > > > >>> > > > >>> Please help take another look. > > > >>> > > > >>> Thank you~ > > > >>> > > > >>> Xintong Song > > > >>> > > > >>> > > > >>> > > > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <tonysong...@gmail.com > > > > > >> wrote: > > > >>> > > > >>>> Thanks for the inputs, Aljoscha & Till. > > > >>>> > > > >>>> > > > >>>> # Weight Configuration > > > >>>> > > > >>>> > > > >>>> I think exposing the knobs incrementally is a good idea. However, > > I'm > > > >> not > > > >>>> sure about non-configurable as the first step. > > > >>>> > > > >>>> > > > >>>> Currently, users can tune memory for rocksdb > > > >>>> ('taskmanager.memory.managed.size') and python > > > >>>> ('python.fn-execution.[framework|buffer].memory.size') separately, > > > >> which > > > >>>> practically means any combination of rocksdb and python memory > > sizes. > > > >> If > > > >>> we > > > >>>> switch to non-configurable weights, that will be a regression > > compared > > > >> to > > > >>>> 1.11. > > > >>>> > > > >>>> > > > >>>> Therefore, I think exposing via configuration options might be a > > good > > > >>>> first step. And we can discuss exposing via ExecutionConfig if > later > > > we > > > >>> see > > > >>>> that requirement. > > > >>>> > > > >>>> > > > >>>> # Naming of Weights > > > >>>> > > > >>>> > > > >>>> I'm neutral for "Flink/Internal memory". > > > >>>> > > > >>>> > > > >>>> I think the reason we can combine weights for batch algorithms and > > > >> state > > > >>>> backends is that they are never mixed together. My only concern > > > >>>> for "Flink/Internal memory", which might not be a problem at the > > > >> moment, > > > >>> is > > > >>>> that what if new memory use cases appear in the future, which can > > also > > > >> be > > > >>>> described by "Flink/Internal memory" but is not guaranteed not > mixed > > > >> with > > > >>>> batch algorithms or state backends? > > > >>>> > > > >>>> > > > >>>> Anyway, I think the naming should not block this FLIP, as long as > we > > > >> have > > > >>>> consensus on combining the two weights for rocksdb and batch > > > >> algorithms. > > > >>> We > > > >>>> can keep the naming discussion open until the implementation > phase. > > > >>>> > > > >>>> > > > >>>> Thank you~ > > > >>>> > > > >>>> Xintong Song > > > >>>> > > > >>>> > > > >>>> > > > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann < > trohrm...@apache.org > > > > > > >>>> wrote: > > > >>>> > > > >>>>> Thanks for creating this FLIP Xintong. > > > >>>>> > > > >>>>> I agree with the previous comments that the memory configuration > > > >> should > > > >>> be > > > >>>>> as easy as possible. Every new knob has the potential to confuse > > > users > > > >>>>> and/or allows him to shoot himself in the foot. Consequently, I > am > > +1 > > > >>> for > > > >>>>> the first proposal in the FLIP since it is simpler. > > > >>>>> > > > >>>>> Also +1 for Stephan's proposal to combine batch operator's and > > > >>>>> RocksDB's memory usage into one weight. > > > >>>>> > > > >>>>> Concerning the names for the two weights, I fear that we are > facing > > > >> one > > > >>> of > > > >>>>> the two hard things in computer science. To add another idea, we > > > could > > > >>>>> name > > > >>>>> them "Flink memory"/"Internal memory" and "Python memory". > > > >>>>> > > > >>>>> For the sake of making the scope of the FLIP as small as possible > > and > > > >> to > > > >>>>> develop the feature incrementally, I think that Aljoscha's > proposal > > > to > > > >>>>> make > > > >>>>> it non-configurable for the first step sounds like a good idea. > As > > a > > > >>> next > > > >>>>> step (and also if we see need), we can make the memory weights > > > >>>>> configurable > > > >>>>> via the configuration. And last, we could expose it via the > > > >>>>> ExecutionConfig > > > >>>>> if it is required. > > > >>>>> > > > >>>>> Cheers, > > > >>>>> Till > > > >>>>> > > > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek < > > aljos...@apache.org > > > > > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hi, > > > >>>>>> > > > >>>>>> playing devils advocate here: should we even make the memory > > weights > > > >>>>>> configurable? We could go with weights that should make sense > for > > > >> most > > > >>>>>> cases in the first version and only introduce configurable > weights > > > >>> when > > > >>>>>> (if) users need them. > > > >>>>>> > > > >>>>>> Regarding where/how things are configured, I think that most > > things > > > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This > > > >> makes > > > >>>>>> them configurable via flink-conf.yaml and via command line > > > >> parameters, > > > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think > > > >> about > > > >>>>>> offering programmatic API for cases where it makes sense, of > > course. > > > >>>>>> > > > >>>>>> Regarding naming one of the configurable weights > > > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to > be > > > >> that > > > >>>>>> specific because the option will not age well. For example when > we > > > >>> want > > > >>>>>> to change which group of memory consumers are configured > together > > or > > > >>>>>> when we add something new. > > > >>>>>> > > > >>>>>> Best, > > > >>>>>> Aljoscha > > > >>>>>> > > > >>>>>> On 31.08.20 08:13, Xintong Song wrote: > > > >>>>>>> Thanks for the feedbacks, @Stephan > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> - There is a push to make as much as possible configurable > via > > > >>> the > > > >>>>>> main > > > >>>>>>>> configuration, and not only in code. Specifically values for > > > >>>>> operations > > > >>>>>> and > > > >>>>>>>> tuning. > > > >>>>>>>> I think it would be more important to have such memory > > > >> weights > > > >>>>> in > > > >>>>>> the > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> I can see the benefit that having memory weights in the main > > > >>>>>> configuration > > > >>>>>>> makes tuning easier, which makes great sense to me. On the > other > > > >>> hand, > > > >>>>>> what > > > >>>>>>> we lose is the flexibility to have different weights for jobs > > > >>> running > > > >>>>> in > > > >>>>>>> the same Flink cluster. It seems to me the problem is that we > > > >> don't > > > >>>>> have > > > >>>>>> an > > > >>>>>>> easy way to overwrite job-specific configurations without > > touching > > > >>> the > > > >>>>>>> codes. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> Given the current status, what if we make the memory weights > > > >>>>> configurable > > > >>>>>>> through both the main configuration and the programming API? > The > > > >>> main > > > >>>>>>> configuration should take effect iff the weights are not > > > >> explicitly > > > >>>>>>> specified through the programming API. In this way, job cluster > > > >>> users > > > >>>>> can > > > >>>>>>> easily tune the weight through the main configuration, while > > > >> session > > > >>>>>>> cluster users, if they want to have different weights for jobs, > > > >> can > > > >>>>> still > > > >>>>>>> overwrite the weight through execution configs. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> - My recommendation would be to keep this as simple as > > > >> possible. > > > >>>>> This > > > >>>>>>>> will make a lot of configuration code harder, and make it > harder > > > >>> for > > > >>>>>> users > > > >>>>>>>> to understand Flink's memory model. > > > >>>>>>>> Making things as easy for users to understand is very > > > >>> important > > > >>>>> in > > > >>>>>> my > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems > > > >> better > > > >>>>> than > > > >>>>>>>> the alternative proposal listed at the end of the FLIP page. > > > >>>>>>> > > > >>>>>>> +1 from my side. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> - For the simplicity, we could go even further and simply > have > > > >>> two > > > >>>>>> memory > > > >>>>>>>> users at the moment: The operator algorithm/data-structure and > > > >> the > > > >>>>>> external > > > >>>>>>>> language process (Python for now). > > > >>>>>>>> We never have batch algos and RocksDB mixed, having this > as > > > >>>>>> separate > > > >>>>>>>> options is confusing as it suggests this can be combined > > > >>>>> arbitrarily. I > > > >>>>>>>> also think that a slim possibility that we may ever combine > this > > > >> in > > > >>>>> the > > > >>>>>>>> future is not enough reason to make it more complex/confusing. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're > > > >>> never > > > >>>>>> mixed > > > >>>>>>> together. We can even just name it > "StateBackend-BatchAlgorithm" > > > >> for > > > >>>>>>> explicitly. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> For "external language process", I'm not entirely sure. Future > > > >>>>> external > > > >>>>>>> languages are possibly mixed with python processes. To avoid > > later > > > >>>>>>> considering how to share external language memory across > > different > > > >>>>>>> languages, I would suggest to present the concept as "python > > > >> memory" > > > >>>>>> rather > > > >>>>>>> than "external language process memory". > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> Thank you~ > > > >>>>>>> > > > >>>>>>> Xintong Song > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen < > se...@apache.org> > > > >>>>> wrote: > > > >>>>>>> > > > >>>>>>>> Thanks for driving this proposal. A few thoughts on the > current > > > >>>>> design: > > > >>>>>>>> > > > >>>>>>>> - There is a push to make as much as possible configurable > via > > > >>> the > > > >>>>>> main > > > >>>>>>>> configuration, and not only in code. Specifically values for > > > >>>>> operations > > > >>>>>> and > > > >>>>>>>> tuning. > > > >>>>>>>> I think it would be more important to have such memory > > > >> weights > > > >>>>> in > > > >>>>>> the > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha > > > >>>>>>>> > > > >>>>>>>> - My recommendation would be to keep this as simple as > > > >> possible. > > > >>>>> This > > > >>>>>>>> will make a lot of configuration code harder, and make it > harder > > > >>> for > > > >>>>>> users > > > >>>>>>>> to understand Flink's memory model. > > > >>>>>>>> Making things as easy for users to understand is very > > > >>> important > > > >>>>> in > > > >>>>>> my > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems > > > >> better > > > >>>>> than > > > >>>>>>>> the alternative proposal listed at the end of the FLIP page. > > > >>>>>>>> > > > >>>>>>>> - For the simplicity, we could go even further and simply > have > > > >>> two > > > >>>>>> memory > > > >>>>>>>> users at the moment: The operator algorithm/data-structure and > > > >> the > > > >>>>>> external > > > >>>>>>>> language process (Python for now). > > > >>>>>>>> We never have batch algos and RocksDB mixed, having this > as > > > >>>>>> separate > > > >>>>>>>> options is confusing as it suggests this can be combined > > > >>>>> arbitrarily. I > > > >>>>>>>> also think that a slim possibility that we may ever combine > this > > > >> in > > > >>>>> the > > > >>>>>>>> future is not enough reason to make it more complex/confusing. > > > >>>>>>>> > > > >>>>>>>> - I am also not aware of any plans to combine the network > and > > > >>>>>> operator > > > >>>>>>>> memory. Not that it would be infeasible to do this, but I > think > > > >>> this > > > >>>>>> would > > > >>>>>>>> also be orthogonal to this change, and I am not sure this > would > > > >> be > > > >>>>>> solved > > > >>>>>>>> with static weights. So trying to get network memory into this > > > >>>>> proposal > > > >>>>>>>> seems pre-mature to me. > > > >>>>>>>> > > > >>>>>>>> Best, > > > >>>>>>>> Stephan > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song < > > > >>> tonysong...@gmail.com > > > >>>>>> > > > >>>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> A quick question, does network memory treated as managed > > memory > > > >>>>> now? > > > >>>>>> Or > > > >>>>>>>>> in > > > >>>>>>>>>> the future? > > > >>>>>>>>>> > > > >>>>>>>>> No, network memory is independent from managed memory ATM. > And > > > >> I'm > > > >>>>> not > > > >>>>>>>>> aware of any plan to combine these two. > > > >>>>>>>>> > > > >>>>>>>>> Any insights there? > > > >>>>>>>>> > > > >>>>>>>>> Thank you~ > > > >>>>>>>>> > > > >>>>>>>>> Xintong Song > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <ykt...@gmail.com > > > > > >>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> A quick question, does network memory treated as managed > > memory > > > >>>>> now? > > > >>>>>> Or > > > >>>>>>>>> in > > > >>>>>>>>>> the future? > > > >>>>>>>>>> > > > >>>>>>>>>> Best, > > > >>>>>>>>>> Kurt > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song < > > > >>>>> tonysong...@gmail.com> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>>> Hi devs, > > > >>>>>>>>>>> > > > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which > > > >>> proposes > > > >>>>> how > > > >>>>>>>>>>> managed memory should be shared by various use cases > within a > > > >>>>> slot. > > > >>>>>>>>> This > > > >>>>>>>>>> is > > > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB > > > >> state > > > >>>>>>>> backend > > > >>>>>>>>>> and > > > >>>>>>>>>>> batch operators are the only use cases of managed memory > for > > > >>>>>>>> streaming > > > >>>>>>>>>> and > > > >>>>>>>>>>> batch jobs respectively, which is no longer true with the > > > >>>>>>>> introduction > > > >>>>>>>>> of > > > >>>>>>>>>>> Python UDFs. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Please notice that we have not reached consensus between > two > > > >>>>>>>> different > > > >>>>>>>>>>> designs. The major part of this FLIP describes one of the > > > >>>>> candidates, > > > >>>>>>>>>> while > > > >>>>>>>>>>> the alternative is discussed in the section "Rejected > > > >>>>> Alternatives". > > > >>>>>>>> We > > > >>>>>>>>>> are > > > >>>>>>>>>>> hoping to borrow intelligence from the community to help us > > > >>>>> resolve > > > >>>>>>>> the > > > >>>>>>>>>>> disagreement. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Any feedback would be appreciated. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Thank you~ > > > >>>>>>>>>>> > > > >>>>>>>>>>> Xintong Song > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> [1] > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>> > > > >>>>> > > > >>> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility > > > >>>>>>>>>>> > > > >>>>>>>>>>> [2] > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>> > > > >>>>> > > > >>> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > > > >