Thanks for the reply, Dian, that make sense to me.

Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午11:53写道:

> Hi Jeff,
>
> Thanks for your feedback.
>
> ArrowTableSink is a Flink sink which is responsible for collecting the
> data of the table. It will serialize the data of the table to Arrow format
> to make sure that it could be deserialized to pandas dataframe efficiently.
> You are right that pandas dataframe is constructed at the client side and
> so there needs a way to transfer the table data from the ArrowTableSink to
> the client. It shares the same design as Table.collect on how to transfer
> the data to the client. This is still under lively discussion in
> FLINK-14807. I think we can discuss it there on this aspect and so it's not
> touched in this design(already mentioned in the design doc). Then we can
> focus on table/dataframe conversion in this design. Does that make sense to
> you?
>
> Thanks,
> Dian
>
> [1] https://issues.apache.org/jira/browse/FLINK-14807 <
> https://issues.apache.org/jira/browse/FLINK-14807>
> > 在 2020年4月1日,上午11:14,Jeff Zhang <zjf...@gmail.com> 写道:
> >
> > Thanks Dian for driving this, definitely +1
> >
> > Here's my 2 cents:
> >
> > 1. I would pay more attention on to_pandas than from_pandas.  Because
> > to_pandas will be used more frequently I believe
> > 2. I think ArrowTableSink may not be enough for to_pandas, because pandas
> > dataframe is on client side, it is not a table sink. We still need to
> > convert ArrowTableSink to pandas dataframe if I understand correctly.
> >
> >
> >
> >
> > Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午10:49写道:
> >
> >> Hi everyone,
> >>
> >> I'd like to start a discussion about supporting conversion between
> PyFlink
> >> Table and Pandas DataFrame.
> >>
> >> Pandas dataframe is the de-facto standard to work with tabular data in
> >> Python community. PyFlink table is Flink’s representation of the tabular
> >> data in Python language. It would be nice to provide the functionality
> to
> >> convert between the PyFlink table and Pandas dataframe in PyFlink Table
> >> API. It provides users the ability to switch between PyFlink and Pandas
> >> seamlessly when processing data in Python language without an extra
> >> intermediate connectors.
> >>
> >> Jincheng Sun and I have discussed offline and have drafted the
> >> FLIP-120[1]. Looking forward to your feedback!
> >>
> >> Regards,
> >> Dian
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
>
>

-- 
Best Regards

Jeff Zhang

Reply via email to