Hi Dian,

Thanks for driving this. Big +1 for supporting from/to pandas in PyFlink!

Best,
Wei

> 在 2020年4月3日,13:46,jincheng sun <sunjincheng...@gmail.com> 写道:
> 
> +1, Thanks for bring up this discussion @Dian Fu <dian0511...@gmail.com>
> 
> Best,
> Jincheng
> 
> 
> Jeff Zhang <zjf...@gmail.com> 于2020年4月1日周三 下午1:27写道:
> 
>> Thanks for the reply, Dian, that make sense to me.
>> 
>> Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午11:53写道:
>> 
>>> Hi Jeff,
>>> 
>>> Thanks for your feedback.
>>> 
>>> ArrowTableSink is a Flink sink which is responsible for collecting the
>>> data of the table. It will serialize the data of the table to Arrow
>> format
>>> to make sure that it could be deserialized to pandas dataframe
>> efficiently.
>>> You are right that pandas dataframe is constructed at the client side and
>>> so there needs a way to transfer the table data from the ArrowTableSink
>> to
>>> the client. It shares the same design as Table.collect on how to transfer
>>> the data to the client. This is still under lively discussion in
>>> FLINK-14807. I think we can discuss it there on this aspect and so it's
>> not
>>> touched in this design(already mentioned in the design doc). Then we can
>>> focus on table/dataframe conversion in this design. Does that make sense
>> to
>>> you?
>>> 
>>> Thanks,
>>> Dian
>>> 
>>> [1] https://issues.apache.org/jira/browse/FLINK-14807 <
>>> https://issues.apache.org/jira/browse/FLINK-14807>
>>>> 在 2020年4月1日,上午11:14,Jeff Zhang <zjf...@gmail.com> 写道:
>>>> 
>>>> Thanks Dian for driving this, definitely +1
>>>> 
>>>> Here's my 2 cents:
>>>> 
>>>> 1. I would pay more attention on to_pandas than from_pandas.  Because
>>>> to_pandas will be used more frequently I believe
>>>> 2. I think ArrowTableSink may not be enough for to_pandas, because
>> pandas
>>>> dataframe is on client side, it is not a table sink. We still need to
>>>> convert ArrowTableSink to pandas dataframe if I understand correctly.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午10:49写道:
>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I'd like to start a discussion about supporting conversion between
>>> PyFlink
>>>>> Table and Pandas DataFrame.
>>>>> 
>>>>> Pandas dataframe is the de-facto standard to work with tabular data in
>>>>> Python community. PyFlink table is Flink’s representation of the
>> tabular
>>>>> data in Python language. It would be nice to provide the functionality
>>> to
>>>>> convert between the PyFlink table and Pandas dataframe in PyFlink
>> Table
>>>>> API. It provides users the ability to switch between PyFlink and
>> Pandas
>>>>> seamlessly when processing data in Python language without an extra
>>>>> intermediate connectors.
>>>>> 
>>>>> Jincheng Sun and I have discussed offline and have drafted the
>>>>> FLIP-120[1]. Looking forward to your feedback!
>>>>> 
>>>>> Regards,
>>>>> Dian
>>>>> 
>>>>> [1]
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best Regards
>>>> 
>>>> Jeff Zhang
>>> 
>>> 
>> 
>> --
>> Best Regards
>> 
>> Jeff Zhang
>> 

Reply via email to