Re: [Java] C Data Interface and dictionaries

2021-08-26 Thread roee shlomo
> It seems that we have both raw value and encoded value types in the Java implementation, so there is no information loss? I think that in the Java memory format they are both the index type, see https://github.com/apache/arrow/blob/5003278ded77f1ab385425143aafd085fda1f701/java/vector/src/main/ja

[Java] C Data Interface and dictionaries

2021-08-25 Thread roee shlomo
We are currently implementing the C Data Interface in Java and have some questions regarding dictionary-encoded arrays. We would appreciate some help and guidance, especially from an API perspective. In Java, the dictionary vector is completely separate from the encoded vector. Typically, a Dictio

Re: Adding Parquet encryption support to PyArrow

2020-09-09 Thread Roee Shlomo
Hi Itamar, Thanks for starting the document. I've added an initial draft version of the API (parts of it at least). I have also added problem statement and goals sections to list what I understand that we want to achieve. On 2020/09/08 17:44:07, "Itamar Turner-Trauring" wrote: > Still learni

Re: Adding Parquet encryption support to PyArrow

2020-09-04 Thread Roee Shlomo
ry. > > It should address your technical requirements; if it doesn't, we can > > discuss the gaps. > > All questions are welcome. > > > > Cheers, Gidon > > > > > > On Thu, Sep 3, 2020 at 10:11 PM Roee Shlomo wrote: > > > >> H

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Roee Shlomo
Hi Itamar, I implemented some python wrappers for the low level API and would be happy to collaborate on that. The reason I didn't push this forward yet is what Gidon mentioned. The API to expose to python users needs to be finalized first and it must include the key tools API for interop with