Re: [Java] C Data Interface and dictionaries

2021-09-01 Thread Micah Kornfield
It is quite possible the dictionary related code in Java could use some rethinking. I recall working with them has been a little bit awkward and I think we had some open JIRAs related to this. On Thu, Aug 26, 2021 at 12:52 AM roee shlomo wrote: > > It seems that we have both raw value and encod

Re: [Java] C Data Interface and dictionaries

2021-08-26 Thread roee shlomo
> It seems that we have both raw value and encoded value types in the Java implementation, so there is no information loss? I think that in the Java memory format they are both the index type, see https://github.com/apache/arrow/blob/5003278ded77f1ab385425143aafd085fda1f701/java/vector/src/main/ja

Re: [Java] C Data Interface and dictionaries

2021-08-25 Thread Fan Liya
Hi roee, It seems that we have both raw value and encoded value types in the Java implementation, so there is no information loss? In particular, we have org.apache.arrow.vector.types.pojo.FieldType#type for the raw type and org.apache.arrow.vector.types.pojo.FieldType#dictionary#indexType for th

Re: [Java] C Data Interface and dictionaries

2021-08-25 Thread Hongze Zhang
On Wed, 2021-08-25 at 21:02 +0300, roee shlomo wrote: > This means that an API to import an ArrowSchema (in C) into a > Field/Schema > (in Java) is not suitable for dictionary encoded arrays because there > is an > information loss. Specifically, there is nothing in Field/Schema to > indicate the

Re: [Java] C Data Interface and dictionaries

2021-08-25 Thread Antoine Pitrou
Le 25/08/2021 à 20:02, roee shlomo a écrit : In Java, the dictionary vector is completely separate from the encoded vector. Typically, a DictionaryProvider is available alongside a dictionary encoded vector (to provide dictionaries for the vector and its children). On the other hand, the C Data

[Java] C Data Interface and dictionaries

2021-08-25 Thread roee shlomo
We are currently implementing the C Data Interface in Java and have some questions regarding dictionary-encoded arrays. We would appreciate some help and guidance, especially from an API perspective. In Java, the dictionary vector is completely separate from the encoded vector. Typically, a Dictio