Thanks for the reply, Dian, that make sense to me. Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午11:53写道:
> Hi Jeff, > > Thanks for your feedback. > > ArrowTableSink is a Flink sink which is responsible for collecting the > data of the table. It will serialize the data of the table to Arrow format > to make sure that it could be deserialized to pandas dataframe efficiently. > You are right that pandas dataframe is constructed at the client side and > so there needs a way to transfer the table data from the ArrowTableSink to > the client. It shares the same design as Table.collect on how to transfer > the data to the client. This is still under lively discussion in > FLINK-14807. I think we can discuss it there on this aspect and so it's not > touched in this design(already mentioned in the design doc). Then we can > focus on table/dataframe conversion in this design. Does that make sense to > you? > > Thanks, > Dian > > [1] https://issues.apache.org/jira/browse/FLINK-14807 < > https://issues.apache.org/jira/browse/FLINK-14807> > > 在 2020年4月1日,上午11:14,Jeff Zhang <zjf...@gmail.com> 写道: > > > > Thanks Dian for driving this, definitely +1 > > > > Here's my 2 cents: > > > > 1. I would pay more attention on to_pandas than from_pandas. Because > > to_pandas will be used more frequently I believe > > 2. I think ArrowTableSink may not be enough for to_pandas, because pandas > > dataframe is on client side, it is not a table sink. We still need to > > convert ArrowTableSink to pandas dataframe if I understand correctly. > > > > > > > > > > Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午10:49写道: > > > >> Hi everyone, > >> > >> I'd like to start a discussion about supporting conversion between > PyFlink > >> Table and Pandas DataFrame. > >> > >> Pandas dataframe is the de-facto standard to work with tabular data in > >> Python community. PyFlink table is Flink’s representation of the tabular > >> data in Python language. It would be nice to provide the functionality > to > >> convert between the PyFlink table and Pandas dataframe in PyFlink Table > >> API. It provides users the ability to switch between PyFlink and Pandas > >> seamlessly when processing data in Python language without an extra > >> intermediate connectors. > >> > >> Jincheng Sun and I have discussed offline and have drafted the > >> FLIP-120[1]. Looking forward to your feedback! > >> > >> Regards, > >> Dian > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame > > > > > > > > -- > > Best Regards > > > > Jeff Zhang > > -- Best Regards Jeff Zhang