Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Jingsong Li Thu, 05 Dec 2019 19:13:29 -0800

Hi Dian,

After [1] and [2], in the batch sql world, we will:
- [2] In client/compile side: we use memory weight request memory for
Transformation.
- [1] In runtime side: we use memory fraction to compute memory size and
allocate in StreamOperator.
For your information.


[1] https://jira.apache.org/jira/browse/FLINK-14063
[2] https://jira.apache.org/jira/browse/FLINK-15035

Best,
Jingsong Lee

On Tue, Dec 3, 2019 at 6:07 PM Dian Fu <dian0511...@gmail.com> wrote:

> Hi Jingsong,
>
> Thanks for your valuable feedback. I have updated the "Example" section
> describing how to use these options in a Python Table API program.
>
> Thanks,
> Dian
>
> > 在 2019年12月2日，下午6:12，Jingsong Lee <lzljs3620...@apache.org> 写道：
> >
> > Hi Dian:
> >
> > Thanks for you explanation.
> > If you can update the document to add explanation for the changes to the
> > table layer,
> > it might be better. (it's just a suggestion, it depends on you)
> > About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
> > Will this queue take up a lot of memory?
> > Can it also occupy memory as large as buffer.memory?
> > If so, what we're dealing with now is the silent use of heap memory?
> > I feel a little strange, because the memory on the python side will
> reserve,
> > but the memory on the JVM side is used silently.
> >
> > After carefully seeing your comments on Google doc:
> >> The memory used by the Java operator is currently accounted as the task
> > on-heap memory. We can revisit this if we find it's a problem in the
> future.
> > I agree that we can ignore it now, But we can add some content to the
> > document to remind the user, What do you think?
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Dec 2, 2019 at 5:17 PM Dian Fu <dian0511...@gmail.com> wrote:
> >
> >> Hi Jingsong,
> >>
> >> Thanks a lot for your comments. Please see my reply inlined below.
> >>
> >>> 在 2019年12月2日，下午3:47，Jingsong Lee <lzljs3620...@apache.org> 写道：
> >>>
> >>> Hi Dian:
> >>>
> >>>
> >>> Thanks for your driving. I have some questions:
> >>>
> >>>
> >>> - Where should these configurations belong? You have mentioned
> >> tableApi/SQL,
> >>> so should in TableConfig?
> >>
> >> All Python related configurations are defined in PythonOptions. User
> could
> >> configure these configurations via TableConfig.getConfiguration.setXXX
> for
> >> Python Table API programs.
> >>
> >>>
> >>> - If just in table/sql, whether it should be called: table.python.****,
> >>> because in table, all config options are called table.***.
> >>
> >> These configurations are not table specific. They will be used for both
> >> Python Table API programs and Python DataStream API programs (which is
> >> planned to be supported in the future). So python.xxx seems more
> >> appropriate, what do you think?
> >>
> >>> - What should table module do? So in CommonPythonCalc, we should read
> >>> options from table config, and set resources to OneInputTransformation?
> >>
> >> As described in the design doc, in compilation phase, for batch jobs,
> the
> >> required memory of the Python worker will be calculated according to the
> >> configuration and set as the managed memory for the operator. For stream
> >> jobs, the resource spec will be unknown(The reason is that currently the
> >> resources for all the operators in stream jobs are unknown and it
> doesn’t
> >> support to configure both known and unknown resources in a single job).
> >>
> >>> - Are all buffer.memory off-heap memory? I took a look
> >>> to AbstractPythonScalarFunctionOperator, there is a
> forwardedInputQueue,
> >> is
> >>> this one a heap queue? So we need heap memory too?
> >>
> >> Yes, they are all off-heap memory which is supposed to be used by the
> >> Python process. The forwardedInputQueue is a buffer used in the Java
> >> operator and its memory is accounted as the on-heap memory.
> >>
> >> Regards,
> >> Dian
> >>
> >>>
> >>> Hope to get your reply.
> >>>
> >>>
> >>> Best,
> >>>
> >>> Jingsong Lee
> >>>
> >>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <dian0511...@gmail.com>
> wrote:
> >>>
> >>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
> >>>> offline and also on the design doc.
> >>>>
> >>>> It seems that we have reached consensus on the design. I would bring
> up
> >>>> the VOTE if there is no other feedbacks.
> >>>>
> >>>> Thanks,
> >>>> Dian
> >>>>
> >>>>> 在 2019年11月22日，下午2:51，Hequn Cheng <chenghe...@gmail.com> 写道：
> >>>>>
> >>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
> >>>>> It is great to make sure that the resources used by the Python
> process
> >>>> are
> >>>>> managed properly by Flink’s resource management framework.
> >>>>>
> >>>>> Also, thanks to the guys that working on the unified memory
> management
> >>>>> framework.
> >>>>>
> >>>>> Best, Hequn
> >>>>>
> >>>>>
> >>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <karma...@gmail.com>
> wrote:
> >>>>>
> >>>>>> Thanks for driving this discussion, Dian!
> >>>>>>
> >>>>>> +1 for this proposal. It will help to reduce container failure due
> to
> >>>>>> the memory overuse.
> >>>>>> Some comments left in the design doc.
> >>>>>>
> >>>>>> Best,
> >>>>>> Yangze Guo
> >>>>>>
> >>>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <tonysong...@gmail.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Sorry for the late reply.
> >>>>>>>
> >>>>>>> +1 for the general proposal.
> >>>>>>>
> >>>>>>> And one remainder, to use UNKNOWN resource requirement, we need to
> >> make
> >>>>>>> sure optimizer knowns which operators use off-heap managed memory,
> >> and
> >>>>>>> compute and set a fraction to the operators. See FLIP-53[1] for
> more
> >>>>>>> details, and I would suggest you to double check with @Zhu Zhu who
> >>>> works
> >>>>>> on
> >>>>>>> this part.
> >>>>>>>
> >>>>>>> Thank you~
> >>>>>>>
> >>>>>>> Xintong Song
> >>>>>>>
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>>>>>
> >>>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <dian0511...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Jincheng,
> >>>>>>>>
> >>>>>>>> Thanks for the reply and also looking forward to the feedback from
> >> the
> >>>>>>>> community.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Dian
> >>>>>>>>
> >>>>>>>>> 在 2019年11月11日，下午2:34，jincheng sun <sunjincheng...@gmail.com> 写道：
> >>>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> +1， Thanks for bring up this discussion Dian!
> >>>>>>>>>
> >>>>>>>>> The Resource Management is very important for PyFlink UDF. So,
> It's
> >>>>>> great
> >>>>>>>>> if anyone can add more comments or inputs in the design doc or
> >>>>>> feedback
> >>>>>>>> in
> >>>>>>>>> ML. :)
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Jincheng
> >>>>>>>>>
> >>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年11月5日周二 上午11:32写道：
> >>>>>>>>>
> >>>>>>>>>> Hi everyone,
> >>>>>>>>>>
> >>>>>>>>>> In FLIP-58[1] it will add the support of Python user-defined
> >>>>>> stateless
> >>>>>>>>>> function for Python Table API. It will launch a separate Python
> >>>>>> process
> >>>>>>>> for
> >>>>>>>>>> Python user-defined function execution. The resources used by
> the
> >>>>>> Python
> >>>>>>>>>> process should be managed properly by Flink’s resource
> management
> >>>>>>>>>> framework. FLIP-49[2] has proposed a unified memory management
> >>>>>> framework
> >>>>>>>>>> and PyFlink user-defined function resource management should be
> >>>>>> based on
> >>>>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline
> about
> >>>>>>>> this. I
> >>>>>>>>>> draft a design doc[3] and want to start a discussion about
> PyFlink
> >>>>>>>>>> user-defined function resource management.
> >>>>>>>>>>
> >>>>>>>>>> Welcome any comments on the design doc or giving us feedback on
> >> the
> >>>>>> ML
> >>>>>>>>>> directly.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Dian
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >>>>>>>>>> [2]
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> >>>>>>>>>> [3]
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Best, Jingsong Lee
> >>
> >>
> >
> > --
> > Best, Jingsong Lee
>
>

-- 
Best, Jingsong Lee

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Reply via email to