+1, Thanks for bring up this discussion @Dian Fu <dian0511...@gmail.com>
Best, Jincheng Jeff Zhang <zjf...@gmail.com> 于2020年4月1日周三 下午1:27写道: > Thanks for the reply, Dian, that make sense to me. > > Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午11:53写道: > > > Hi Jeff, > > > > Thanks for your feedback. > > > > ArrowTableSink is a Flink sink which is responsible for collecting the > > data of the table. It will serialize the data of the table to Arrow > format > > to make sure that it could be deserialized to pandas dataframe > efficiently. > > You are right that pandas dataframe is constructed at the client side and > > so there needs a way to transfer the table data from the ArrowTableSink > to > > the client. It shares the same design as Table.collect on how to transfer > > the data to the client. This is still under lively discussion in > > FLINK-14807. I think we can discuss it there on this aspect and so it's > not > > touched in this design(already mentioned in the design doc). Then we can > > focus on table/dataframe conversion in this design. Does that make sense > to > > you? > > > > Thanks, > > Dian > > > > [1] https://issues.apache.org/jira/browse/FLINK-14807 < > > https://issues.apache.org/jira/browse/FLINK-14807> > > > 在 2020年4月1日,上午11:14,Jeff Zhang <zjf...@gmail.com> 写道: > > > > > > Thanks Dian for driving this, definitely +1 > > > > > > Here's my 2 cents: > > > > > > 1. I would pay more attention on to_pandas than from_pandas. Because > > > to_pandas will be used more frequently I believe > > > 2. I think ArrowTableSink may not be enough for to_pandas, because > pandas > > > dataframe is on client side, it is not a table sink. We still need to > > > convert ArrowTableSink to pandas dataframe if I understand correctly. > > > > > > > > > > > > > > > Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午10:49写道: > > > > > >> Hi everyone, > > >> > > >> I'd like to start a discussion about supporting conversion between > > PyFlink > > >> Table and Pandas DataFrame. > > >> > > >> Pandas dataframe is the de-facto standard to work with tabular data in > > >> Python community. PyFlink table is Flink’s representation of the > tabular > > >> data in Python language. It would be nice to provide the functionality > > to > > >> convert between the PyFlink table and Pandas dataframe in PyFlink > Table > > >> API. It provides users the ability to switch between PyFlink and > Pandas > > >> seamlessly when processing data in Python language without an extra > > >> intermediate connectors. > > >> > > >> Jincheng Sun and I have discussed offline and have drafted the > > >> FLIP-120[1]. Looking forward to your feedback! > > >> > > >> Regards, > > >> Dian > > >> > > >> [1] > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame > > > > > > > > > > > > -- > > > Best Regards > > > > > > Jeff Zhang > > > > > > -- > Best Regards > > Jeff Zhang >