Hi Joris,

Do you have a signature for __arrow_array__ method in mind?

For example, let's say you want to roundtrip ExtensionArrays or other
third-party data through Arrow.  How do you preserve the required metadata?

Regards

Antoine.


Le 09/05/2019 à 13:29, Joris Van den Bossche a écrit :
> Hi all,
> 
> I want to propose an interface to allow custom array objects in Python to
> define how they should be converted to Arrow arrays (e.g. in
> pyarrow.array(..)). I opened
> https://issues.apache.org/jira/browse/ARROW-5271 for this.
> This would be similar to the numpy __array__ protocol (so we could eg call
> it __arrow_array__).
> Feedback / discussion very welcome!
> 
> I am coming to this discussion specifically from the point of view of
> pandas ExtensionArrays (github issue for this:
> https://github.com/pandas-dev/pandas/issues/20612/#issuecomment-489649556).
> Such a protocol would, for example, make it possible that pandas users can
> save DataFrames with ExtensionArrays (eg the nullable integers) to parquet,
> without the need for pyarrow to know about all those possible different
> extension arrays. This would also be useful for projects extending pandas
> such as GeoPandas <https://github.com/geopandas/geopandas> and Fletcher
> <https://github.com/xhochy/fletcher>.
> But I suppose it could also be of interest more in general of other
> array-like / pandas-like projects that want to interface with arrow.
> 
> Sidenote: for the pandas case, I want to look a the full roundtrip, so also
> the conversion back from an arrow Table to DataFrame. For that aspect there
> is https://issues.apache.org/jira/browse/ARROW-2428, but this is much more
> specific to pandas and its ExtensionArrays.
> 
> Regards,
> Joris
> 

Reply via email to