Re: [Java] C Data Interface and dictionaries

2021-08-25 Thread Fan Liya
Hi roee, It seems that we have both raw value and encoded value types in the Java implementation, so there is no information loss? In particular, we have org.apache.arrow.vector.types.pojo.FieldType#type for the raw type and org.apache.arrow.vector.types.pojo.FieldType#dictionary#indexType for th

Re: [Java] C Data Interface and dictionaries

2021-08-25 Thread Hongze Zhang
On Wed, 2021-08-25 at 21:02 +0300, roee shlomo wrote: > This means that an API to import an ArrowSchema (in C) into a > Field/Schema > (in Java) is not suitable for dictionary encoded arrays because there > is an > information loss. Specifically, there is nothing in Field/Schema to > indicate the

Re: [Java] C Data Interface and dictionaries

2021-08-25 Thread Antoine Pitrou
Le 25/08/2021 à 20:02, roee shlomo a écrit : In Java, the dictionary vector is completely separate from the encoded vector. Typically, a DictionaryProvider is available alongside a dictionary encoded vector (to provide dictionaries for the vector and its children). On the other hand, the C Data

[Java] C Data Interface and dictionaries

2021-08-25 Thread roee shlomo
We are currently implementing the C Data Interface in Java and have some questions regarding dictionary-encoded arrays. We would appreciate some help and guidance, especially from an API perspective. In Java, the dictionary vector is completely separate from the encoded vector. Typically, a Dictio

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Antoine Pitrou
Le 25/08/2021 à 17:27, Joris Van den Bossche a écrit : https://github.com/rapidsai/cudf/blob/be25a30ca20f3135f341c51b36cb075b376d5def/python/cudf/cudf/_lib/cpp/io/types.pxd#L9 Here they are doing `from pyarrow.includes.libarrow cimport CRandomAccessFile` (CRandomAccessFile is the cython equiva

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Joris Van den Bossche
On Wed, 25 Aug 2021 at 17:21, Antoine Pitrou wrote: > > Le 25/08/2021 à 17:12, Joris Van den Bossche a écrit : > > One example of consumer of our Cython API is cudf ( > > https://github.com/rapidsai/cudf). > > I am not very familiar with the package itself, but browsing its code, I > > see that t

Re: [DISCUSS] Proposal to move biweekly Arrow sync call to Zoom

2021-08-25 Thread Neal Richardson
SGTM On Wed, Aug 25, 2021 at 10:04 AM Antoine Pitrou wrote: > +1 > > > Le 25/08/2021 à 16:03, Ian Cook a écrit : > > In last week's Arrow sync call, I suggested that we move future sync > > calls from Google Meet to Zoom. The primary benefit of this is that > > Zoom meetings can be configured to

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Antoine Pitrou
Le 25/08/2021 à 17:12, Joris Van den Bossche a écrit : One example of consumer of our Cython API is cudf ( https://github.com/rapidsai/cudf). I am not very familiar with the package itself, but browsing its code, I see that they do for example cimport RecordBatchReader ( https://github.com/rapi

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Antoine Pitrou
Le 25/08/2021 à 17:17, Keith Kraus a écrit : If I remember correctly the reason cuDF interacts with the Cython code for IPC stuff is that in the past the existing IPC machinery in Arrow didn't work correctly with GPU memory. If that is fixed I think there's a case to remove this code entirely f

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Keith Kraus
If I remember correctly the reason cuDF interacts with the Cython code for IPC stuff is that in the past the existing IPC machinery in Arrow didn't work correctly with GPU memory. If that is fixed I think there's a case to remove this code entirely from cuDF and instruct users to use the higher lev

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Joris Van den Bossche
One example of consumer of our Cython API is cudf ( https://github.com/rapidsai/cudf). I am not very familiar with the package itself, but browsing its code, I see that they do for example cimport RecordBatchReader ( https://github.com/rapidsai/cudf/blob/f6d31fa95d9b8d8658301438d0f9ba22a1c131aa/pyt

Re: [DISCUSS] Proposal to move biweekly Arrow sync call to Zoom

2021-08-25 Thread Antoine Pitrou
+1 Le 25/08/2021 à 16:03, Ian Cook a écrit : In last week's Arrow sync call, I suggested that we move future sync calls from Google Meet to Zoom. The primary benefit of this is that Zoom meetings can be configured to allow participants to join even if the host is not present, thus eliminating t

[DISCUSS] Proposal to move biweekly Arrow sync call to Zoom

2021-08-25 Thread Ian Cook
In last week's Arrow sync call, I suggested that we move future sync calls from Google Meet to Zoom. The primary benefit of this is that Zoom meetings can be configured to allow participants to join even if the host is not present, thus eliminating the need for any one particular person or person w

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Antoine Pitrou
Le 20/08/2021 à 12:24, Alessandro Molina a écrit : We could argue that only what was documented explicitly should be considered "public" and everything else can be changed, but our documentation seems to be unclear on this point. It lists some functions that should be considered our explicit a

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Alessandro Molina
Given we didn't get much opinions on this one, I will propose we move forward with merging the open PR that moves ipc cython implementation and discover if we receive any open issue because projects out there were relying on it. It seems that ipc is a low risk module from that point of view and wil