Hi Jeff,

Thanks for your feedback.

ArrowTableSink is a Flink sink which is responsible for collecting the data of 
the table. It will serialize the data of the table to Arrow format to make sure 
that it could be deserialized to pandas dataframe efficiently. You are right 
that pandas dataframe is constructed at the client side and so there needs a 
way to transfer the table data from the ArrowTableSink to the client. It shares 
the same design as Table.collect on how to transfer the data to the client. 
This is still under lively discussion in FLINK-14807. I think we can discuss it 
there on this aspect and so it's not touched in this design(already mentioned 
in the design doc). Then we can focus on table/dataframe conversion in this 
design. Does that make sense to you?

Thanks,
Dian

[1] https://issues.apache.org/jira/browse/FLINK-14807 
<https://issues.apache.org/jira/browse/FLINK-14807>
> 在 2020年4月1日,上午11:14,Jeff Zhang <zjf...@gmail.com> 写道:
> 
> Thanks Dian for driving this, definitely +1
> 
> Here's my 2 cents:
> 
> 1. I would pay more attention on to_pandas than from_pandas.  Because
> to_pandas will be used more frequently I believe
> 2. I think ArrowTableSink may not be enough for to_pandas, because pandas
> dataframe is on client side, it is not a table sink. We still need to
> convert ArrowTableSink to pandas dataframe if I understand correctly.
> 
> 
> 
> 
> Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午10:49写道:
> 
>> Hi everyone,
>> 
>> I'd like to start a discussion about supporting conversion between PyFlink
>> Table and Pandas DataFrame.
>> 
>> Pandas dataframe is the de-facto standard to work with tabular data in
>> Python community. PyFlink table is Flink’s representation of the tabular
>> data in Python language. It would be nice to provide the functionality to
>> convert between the PyFlink table and Pandas dataframe in PyFlink Table
>> API. It provides users the ability to switch between PyFlink and Pandas
>> seamlessly when processing data in Python language without an extra
>> intermediate connectors.
>> 
>> Jincheng Sun and I have discussed offline and have drafted the
>> FLIP-120[1]. Looking forward to your feedback!
>> 
>> Regards,
>> Dian
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang

Reply via email to