It is quite possible the dictionary related code in Java could use some
rethinking.  I recall working with them has been a little bit awkward and I
think we had some open JIRAs related to this.

On Thu, Aug 26, 2021 at 12:52 AM roee shlomo <roe...@gmail.com> wrote:

> > It seems that we have both raw value and encoded value types in the Java
> implementation, so there is no information loss?
>
> I think that in the Java memory format they are both the index type, see
>
> https://github.com/apache/arrow/blob/5003278ded77f1ab385425143aafd085fda1f701/java/vector/src/main/java/org/apache/arrow/vector/util/DictionaryUtility.java#L44-L45
>
> Users would expect the Java memory format (e.g., to create Vector or
> VectorSchemaRoot from it directly). I don't think moving to the ipc format
> would be a good idea either, the C data interface is quite different, e.g.,
> should support import/export of individual vectors. However, the IPC code
> is a good reference for learning how to handle dictionaries so I'll go over
> it more carefully.
>

Reply via email to