Hi Joris,
Do you have a signature for __arrow_array__ method in mind? For example, let's say you want to roundtrip ExtensionArrays or other third-party data through Arrow. How do you preserve the required metadata? Regards Antoine. Le 09/05/2019 à 13:29, Joris Van den Bossche a écrit : > Hi all, > > I want to propose an interface to allow custom array objects in Python to > define how they should be converted to Arrow arrays (e.g. in > pyarrow.array(..)). I opened > https://issues.apache.org/jira/browse/ARROW-5271 for this. > This would be similar to the numpy __array__ protocol (so we could eg call > it __arrow_array__). > Feedback / discussion very welcome! > > I am coming to this discussion specifically from the point of view of > pandas ExtensionArrays (github issue for this: > https://github.com/pandas-dev/pandas/issues/20612/#issuecomment-489649556). > Such a protocol would, for example, make it possible that pandas users can > save DataFrames with ExtensionArrays (eg the nullable integers) to parquet, > without the need for pyarrow to know about all those possible different > extension arrays. This would also be useful for projects extending pandas > such as GeoPandas <https://github.com/geopandas/geopandas> and Fletcher > <https://github.com/xhochy/fletcher>. > But I suppose it could also be of interest more in general of other > array-like / pandas-like projects that want to interface with arrow. > > Sidenote: for the pandas case, I want to look a the full roundtrip, so also > the conversion back from an arrow Table to DataFrame. For that aspect there > is https://issues.apache.org/jira/browse/ARROW-2428, but this is much more > specific to pandas and its ExtensionArrays. > > Regards, > Joris >