Op do 9 mei 2019 om 21:38 schreef Uwe L. Korn <uw...@xhochy.com>: > +1 to the idea of adding a protocol to let other objects define their way > to Arrow structures. For pandas.Series I would expect that they return an > Arrow Column. > > For the Arrow->pandas conversion I have a bit mixed feelings. In the > normal Fletcher case I would expect that we don't convert anything as we > represent anything from Arrow with it.
Yes, you don't want to convert anything (apart from wrapping the arrow array into a FletcherArray). But how does Table.to_pandas know that? Maybe it doesn't need to know that. And then you might write a function in fletcher to convert a pyarrow Table to a pandas DataFrame with fletcher-backed columns. But if you want to have this roundtrip automatically, without the need that each project that defines an ExtensionArray and wants to interact with arrow (eg in GeoPandas as well) needs to have his own "arrow-table-to-pandas-dataframe" converter, pyarrow needs to have some notion of how to convert back to a pandas ExtensionArray. > For the case where we want to restore the exact pandas DataFrame we had > before this will become a bit more complicated as we either would need to > have all third-party libraries to support Arrow via a hook as proposed or > we also define some kind of other protocol on the pandas side to > reconstruct ExtensionArrays from Arrow data. > That last one is basically what I proposed in https://github.com/pandas-dev/pandas/issues/20612/#issuecomment-489649556 Thanks Antoine and Uwe for the discussion! Joris