It is quite possible the dictionary related code in Java could use some
rethinking. I recall working with them has been a little bit awkward and I
think we had some open JIRAs related to this.
On Thu, Aug 26, 2021 at 12:52 AM roee shlomo wrote:
> > It seems that we have both raw value and encod
> It seems that we have both raw value and encoded value types in the Java
implementation, so there is no information loss?
I think that in the Java memory format they are both the index type, see
https://github.com/apache/arrow/blob/5003278ded77f1ab385425143aafd085fda1f701/java/vector/src/main/ja
Hi roee,
It seems that we have both raw value and encoded value types in the Java
implementation, so there is no information loss?
In particular, we have org.apache.arrow.vector.types.pojo.FieldType#type
for the raw type
and org.apache.arrow.vector.types.pojo.FieldType#dictionary#indexType for
th
On Wed, 2021-08-25 at 21:02 +0300, roee shlomo wrote:
> This means that an API to import an ArrowSchema (in C) into a
> Field/Schema
> (in Java) is not suitable for dictionary encoded arrays because there
> is an
> information loss. Specifically, there is nothing in Field/Schema to
> indicate the
Le 25/08/2021 à 20:02, roee shlomo a écrit :
In Java, the dictionary vector is completely separate from the encoded
vector. Typically, a DictionaryProvider is available alongside a dictionary
encoded vector (to provide dictionaries for the vector and its children).
On the other hand, the C Data
We are currently implementing the C Data Interface in Java and have some
questions regarding dictionary-encoded arrays. We would appreciate some
help and guidance, especially from an API perspective.
In Java, the dictionary vector is completely separate from the encoded
vector. Typically, a Dictio