Hi Dian, After [1] and [2], in the batch sql world, we will: - [2] In client/compile side: we use memory weight request memory for Transformation. - [1] In runtime side: we use memory fraction to compute memory size and allocate in StreamOperator. For your information.
[1] https://jira.apache.org/jira/browse/FLINK-14063 [2] https://jira.apache.org/jira/browse/FLINK-15035 Best, Jingsong Lee On Tue, Dec 3, 2019 at 6:07 PM Dian Fu <dian0511...@gmail.com> wrote: > Hi Jingsong, > > Thanks for your valuable feedback. I have updated the "Example" section > describing how to use these options in a Python Table API program. > > Thanks, > Dian > > > 在 2019年12月2日,下午6:12,Jingsong Lee <lzljs3620...@apache.org> 写道: > > > > Hi Dian: > > > > Thanks for you explanation. > > If you can update the document to add explanation for the changes to the > > table layer, > > it might be better. (it's just a suggestion, it depends on you) > > About forwardedInputQueue in AbstractPythonScalarFunctionOperator, > > Will this queue take up a lot of memory? > > Can it also occupy memory as large as buffer.memory? > > If so, what we're dealing with now is the silent use of heap memory? > > I feel a little strange, because the memory on the python side will > reserve, > > but the memory on the JVM side is used silently. > > > > After carefully seeing your comments on Google doc: > >> The memory used by the Java operator is currently accounted as the task > > on-heap memory. We can revisit this if we find it's a problem in the > future. > > I agree that we can ignore it now, But we can add some content to the > > document to remind the user, What do you think? > > > > Best, > > Jingsong Lee > > > > On Mon, Dec 2, 2019 at 5:17 PM Dian Fu <dian0511...@gmail.com> wrote: > > > >> Hi Jingsong, > >> > >> Thanks a lot for your comments. Please see my reply inlined below. > >> > >>> 在 2019年12月2日,下午3:47,Jingsong Lee <lzljs3620...@apache.org> 写道: > >>> > >>> Hi Dian: > >>> > >>> > >>> Thanks for your driving. I have some questions: > >>> > >>> > >>> - Where should these configurations belong? You have mentioned > >> tableApi/SQL, > >>> so should in TableConfig? > >> > >> All Python related configurations are defined in PythonOptions. User > could > >> configure these configurations via TableConfig.getConfiguration.setXXX > for > >> Python Table API programs. > >> > >>> > >>> - If just in table/sql, whether it should be called: table.python.****, > >>> because in table, all config options are called table.***. > >> > >> These configurations are not table specific. They will be used for both > >> Python Table API programs and Python DataStream API programs (which is > >> planned to be supported in the future). So python.xxx seems more > >> appropriate, what do you think? > >> > >>> - What should table module do? So in CommonPythonCalc, we should read > >>> options from table config, and set resources to OneInputTransformation? > >> > >> As described in the design doc, in compilation phase, for batch jobs, > the > >> required memory of the Python worker will be calculated according to the > >> configuration and set as the managed memory for the operator. For stream > >> jobs, the resource spec will be unknown(The reason is that currently the > >> resources for all the operators in stream jobs are unknown and it > doesn’t > >> support to configure both known and unknown resources in a single job). > >> > >>> - Are all buffer.memory off-heap memory? I took a look > >>> to AbstractPythonScalarFunctionOperator, there is a > forwardedInputQueue, > >> is > >>> this one a heap queue? So we need heap memory too? > >> > >> Yes, they are all off-heap memory which is supposed to be used by the > >> Python process. The forwardedInputQueue is a buffer used in the Java > >> operator and its memory is accounted as the on-heap memory. > >> > >> Regards, > >> Dian > >> > >>> > >>> Hope to get your reply. > >>> > >>> > >>> Best, > >>> > >>> Jingsong Lee > >>> > >>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <dian0511...@gmail.com> > wrote: > >>> > >>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu > >>>> offline and also on the design doc. > >>>> > >>>> It seems that we have reached consensus on the design. I would bring > up > >>>> the VOTE if there is no other feedbacks. > >>>> > >>>> Thanks, > >>>> Dian > >>>> > >>>>> 在 2019年11月22日,下午2:51,Hequn Cheng <chenghe...@gmail.com> 写道: > >>>>> > >>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this! > >>>>> It is great to make sure that the resources used by the Python > process > >>>> are > >>>>> managed properly by Flink’s resource management framework. > >>>>> > >>>>> Also, thanks to the guys that working on the unified memory > management > >>>>> framework. > >>>>> > >>>>> Best, Hequn > >>>>> > >>>>> > >>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <karma...@gmail.com> > wrote: > >>>>> > >>>>>> Thanks for driving this discussion, Dian! > >>>>>> > >>>>>> +1 for this proposal. It will help to reduce container failure due > to > >>>>>> the memory overuse. > >>>>>> Some comments left in the design doc. > >>>>>> > >>>>>> Best, > >>>>>> Yangze Guo > >>>>>> > >>>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <tonysong...@gmail.com > > > >>>>>> wrote: > >>>>>>> > >>>>>>> Sorry for the late reply. > >>>>>>> > >>>>>>> +1 for the general proposal. > >>>>>>> > >>>>>>> And one remainder, to use UNKNOWN resource requirement, we need to > >> make > >>>>>>> sure optimizer knowns which operators use off-heap managed memory, > >> and > >>>>>>> compute and set a fraction to the operators. See FLIP-53[1] for > more > >>>>>>> details, and I would suggest you to double check with @Zhu Zhu who > >>>> works > >>>>>> on > >>>>>>> this part. > >>>>>>> > >>>>>>> Thank you~ > >>>>>>> > >>>>>>> Xintong Song > >>>>>>> > >>>>>>> > >>>>>>> [1] > >>>>>>> > >>>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management > >>>>>>> > >>>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <dian0511...@gmail.com> > >>>> wrote: > >>>>>>> > >>>>>>>> Hi Jincheng, > >>>>>>>> > >>>>>>>> Thanks for the reply and also looking forward to the feedback from > >> the > >>>>>>>> community. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Dian > >>>>>>>> > >>>>>>>>> 在 2019年11月11日,下午2:34,jincheng sun <sunjincheng...@gmail.com> 写道: > >>>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> +1, Thanks for bring up this discussion Dian! > >>>>>>>>> > >>>>>>>>> The Resource Management is very important for PyFlink UDF. So, > It's > >>>>>> great > >>>>>>>>> if anyone can add more comments or inputs in the design doc or > >>>>>> feedback > >>>>>>>> in > >>>>>>>>> ML. :) > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Jincheng > >>>>>>>>> > >>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年11月5日周二 上午11:32写道: > >>>>>>>>> > >>>>>>>>>> Hi everyone, > >>>>>>>>>> > >>>>>>>>>> In FLIP-58[1] it will add the support of Python user-defined > >>>>>> stateless > >>>>>>>>>> function for Python Table API. It will launch a separate Python > >>>>>> process > >>>>>>>> for > >>>>>>>>>> Python user-defined function execution. The resources used by > the > >>>>>> Python > >>>>>>>>>> process should be managed properly by Flink’s resource > management > >>>>>>>>>> framework. FLIP-49[2] has proposed a unified memory management > >>>>>> framework > >>>>>>>>>> and PyFlink user-defined function resource management should be > >>>>>> based on > >>>>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline > about > >>>>>>>> this. I > >>>>>>>>>> draft a design doc[3] and want to start a discussion about > PyFlink > >>>>>>>>>> user-defined function resource management. > >>>>>>>>>> > >>>>>>>>>> Welcome any comments on the design doc or giving us feedback on > >> the > >>>>>> ML > >>>>>>>>>> directly. > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> Dian > >>>>>>>>>> > >>>>>>>>>> [1] > >>>>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table > >>>>>>>>>> [2] > >>>>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > >>>>>>>>>> [3] > >>>>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >> > https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m > >>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >>>> > >>> > >>> -- > >>> Best, Jingsong Lee > >> > >> > > > > -- > > Best, Jingsong Lee > > -- Best, Jingsong Lee