Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-08-19 Thread Wes McKinney
No concerns from me either. On Mon, Aug 19, 2019 at 5:10 AM Antoine Pitrou wrote: > > > No concern from me. It should probably be documented somewhere though :-) > > Regards > > Antoine. > > > Le 16/08/2019 à 17:23, Joris Van den Bossche a écrit : > > Coming back to this older thread, I have ope

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-08-19 Thread Antoine Pitrou
No concern from me. It should probably be documented somewhere though :-) Regards Antoine. Le 16/08/2019 à 17:23, Joris Van den Bossche a écrit : > Coming back to this older thread, I have opened a PR with a proof of > concept of the proposed protocol to convert third-party array objects to

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-08-16 Thread Joris Van den Bossche
Coming back to this older thread, I have opened a PR with a proof of concept of the proposed protocol to convert third-party array objects to arrow: https://github.com/apache/arrow/pull/5106 In the tests, I added the protocol to pandas' nullable integer array (which is currently not supported in th

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-05-20 Thread Joris Van den Bossche
Hi Wes, That indeeds seems as a good fit for the pandas ExtensionArray <-> Arrow conversion. I will look into it starting this week. Joris Op vr 17 mei 2019 om 00:28 schreef Wes McKinney : > hi Joris, > > Somewhat related to this, I want to also point out that we have C++ > extension types [1].

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-05-16 Thread Wes McKinney
hi Joris, Somewhat related to this, I want to also point out that we have C++ extension types [1]. As part of this, it would also be good to define and document a public API for users to create ExtensionArray subclasses that can be serialized and deserialized using this machinery. As a motivating

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-05-10 Thread Joris Van den Bossche
Op do 9 mei 2019 om 21:38 schreef Uwe L. Korn : > +1 to the idea of adding a protocol to let other objects define their way > to Arrow structures. For pandas.Series I would expect that they return an > Arrow Column. > > For the Arrow->pandas conversion I have a bit mixed feelings. In the > normal

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-05-10 Thread Joris Van den Bossche
My initial idea was to not let this protocol pass metadata around (which indeed is not possible for arrays). Currently, metadata are only saved at the level of a Table when converting from a pandas DataFrame (in Table.from_pandas()). That could continue to be the case, where Table.from_pandas both

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-05-09 Thread Uwe L. Korn
+1 to the idea of adding a protocol to let other objects define their way to Arrow structures. For pandas.Series I would expect that they return an Arrow Column. For the Arrow->pandas conversion I have a bit mixed feelings. In the normal Fletcher case I would expect that we don't convert anyth

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-05-09 Thread Antoine Pitrou
Arrow arrays don't have metadata, so if you want to pass metadata around you should at least add a hook for columns as well. Regards Antoine. Le 09/05/2019 à 18:10, Joris Van den Bossche a écrit : > An additional question might be at which "level" to provide such a hook to > third-party packa

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-05-09 Thread Joris Van den Bossche
An additional question might be at which "level" to provide such a hook to third-party packages: I proposed for Array, but what for chunked arrays, columns or tables? Maybe at least returning a chunked array should also be allowed. Op do 9 mei 2019 om 18:06 schreef Joris Van den Bossche < jorisvan

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-05-09 Thread Joris Van den Bossche
The signature I had in mind is something like: def __arrow_array__(self, type : pyarrow.DataType=None) -> pyarrow.Array: where the function returns a pyarrow.Array, and takes an optional data type (in case there are multiple ways to convert to a pyarrow Array, and what can be passed by the user i

Re: [Discuss] [Python] protocol for conversion to pyarrow Array

2019-05-09 Thread Antoine Pitrou
Hi Joris, Do you have a signature for __arrow_array__ method in mind? For example, let's say you want to roundtrip ExtensionArrays or other third-party data through Arrow. How do you preserve the required metadata? Regards Antoine. Le 09/05/2019 à 13:29, Joris Van den Bossche a écrit : > H