Hi Dian, Thanks for driving this. Big +1 for supporting from/to pandas in PyFlink!
Best, Wei > 在 2020年4月3日,13:46,jincheng sun <sunjincheng...@gmail.com> 写道: > > +1, Thanks for bring up this discussion @Dian Fu <dian0511...@gmail.com> > > Best, > Jincheng > > > Jeff Zhang <zjf...@gmail.com> 于2020年4月1日周三 下午1:27写道: > >> Thanks for the reply, Dian, that make sense to me. >> >> Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午11:53写道: >> >>> Hi Jeff, >>> >>> Thanks for your feedback. >>> >>> ArrowTableSink is a Flink sink which is responsible for collecting the >>> data of the table. It will serialize the data of the table to Arrow >> format >>> to make sure that it could be deserialized to pandas dataframe >> efficiently. >>> You are right that pandas dataframe is constructed at the client side and >>> so there needs a way to transfer the table data from the ArrowTableSink >> to >>> the client. It shares the same design as Table.collect on how to transfer >>> the data to the client. This is still under lively discussion in >>> FLINK-14807. I think we can discuss it there on this aspect and so it's >> not >>> touched in this design(already mentioned in the design doc). Then we can >>> focus on table/dataframe conversion in this design. Does that make sense >> to >>> you? >>> >>> Thanks, >>> Dian >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-14807 < >>> https://issues.apache.org/jira/browse/FLINK-14807> >>>> 在 2020年4月1日,上午11:14,Jeff Zhang <zjf...@gmail.com> 写道: >>>> >>>> Thanks Dian for driving this, definitely +1 >>>> >>>> Here's my 2 cents: >>>> >>>> 1. I would pay more attention on to_pandas than from_pandas. Because >>>> to_pandas will be used more frequently I believe >>>> 2. I think ArrowTableSink may not be enough for to_pandas, because >> pandas >>>> dataframe is on client side, it is not a table sink. We still need to >>>> convert ArrowTableSink to pandas dataframe if I understand correctly. >>>> >>>> >>>> >>>> >>>> Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午10:49写道: >>>> >>>>> Hi everyone, >>>>> >>>>> I'd like to start a discussion about supporting conversion between >>> PyFlink >>>>> Table and Pandas DataFrame. >>>>> >>>>> Pandas dataframe is the de-facto standard to work with tabular data in >>>>> Python community. PyFlink table is Flink’s representation of the >> tabular >>>>> data in Python language. It would be nice to provide the functionality >>> to >>>>> convert between the PyFlink table and Pandas dataframe in PyFlink >> Table >>>>> API. It provides users the ability to switch between PyFlink and >> Pandas >>>>> seamlessly when processing data in Python language without an extra >>>>> intermediate connectors. >>>>> >>>>> Jincheng Sun and I have discussed offline and have drafted the >>>>> FLIP-120[1]. Looking forward to your feedback! >>>>> >>>>> Regards, >>>>> Dian >>>>> >>>>> [1] >>>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame >>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>> >>> >> >> -- >> Best Regards >> >> Jeff Zhang >>