+1, Thanks for bring up this discussion @Dian Fu <dian0511...@gmail.com>

Best,
Jincheng


Jeff Zhang <zjf...@gmail.com> 于2020年4月1日周三 下午1:27写道:

> Thanks for the reply, Dian, that make sense to me.
>
> Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午11:53写道:
>
> > Hi Jeff,
> >
> > Thanks for your feedback.
> >
> > ArrowTableSink is a Flink sink which is responsible for collecting the
> > data of the table. It will serialize the data of the table to Arrow
> format
> > to make sure that it could be deserialized to pandas dataframe
> efficiently.
> > You are right that pandas dataframe is constructed at the client side and
> > so there needs a way to transfer the table data from the ArrowTableSink
> to
> > the client. It shares the same design as Table.collect on how to transfer
> > the data to the client. This is still under lively discussion in
> > FLINK-14807. I think we can discuss it there on this aspect and so it's
> not
> > touched in this design(already mentioned in the design doc). Then we can
> > focus on table/dataframe conversion in this design. Does that make sense
> to
> > you?
> >
> > Thanks,
> > Dian
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-14807 <
> > https://issues.apache.org/jira/browse/FLINK-14807>
> > > 在 2020年4月1日,上午11:14,Jeff Zhang <zjf...@gmail.com> 写道:
> > >
> > > Thanks Dian for driving this, definitely +1
> > >
> > > Here's my 2 cents:
> > >
> > > 1. I would pay more attention on to_pandas than from_pandas.  Because
> > > to_pandas will be used more frequently I believe
> > > 2. I think ArrowTableSink may not be enough for to_pandas, because
> pandas
> > > dataframe is on client side, it is not a table sink. We still need to
> > > convert ArrowTableSink to pandas dataframe if I understand correctly.
> > >
> > >
> > >
> > >
> > > Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午10:49写道:
> > >
> > >> Hi everyone,
> > >>
> > >> I'd like to start a discussion about supporting conversion between
> > PyFlink
> > >> Table and Pandas DataFrame.
> > >>
> > >> Pandas dataframe is the de-facto standard to work with tabular data in
> > >> Python community. PyFlink table is Flink’s representation of the
> tabular
> > >> data in Python language. It would be nice to provide the functionality
> > to
> > >> convert between the PyFlink table and Pandas dataframe in PyFlink
> Table
> > >> API. It provides users the ability to switch between PyFlink and
> Pandas
> > >> seamlessly when processing data in Python language without an extra
> > >> intermediate connectors.
> > >>
> > >> Jincheng Sun and I have discussed offline and have drafted the
> > >> FLIP-120[1]. Looking forward to your feedback!
> > >>
> > >> Regards,
> > >> Dian
> > >>
> > >> [1]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame
> > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> >
> >
>
> --
> Best Regards
>
> Jeff Zhang
>

Reply via email to