> It seems that we have both raw value and encoded value types in the Java
implementation, so there is no information loss?

I think that in the Java memory format they are both the index type, see
https://github.com/apache/arrow/blob/5003278ded77f1ab385425143aafd085fda1f701/java/vector/src/main/java/org/apache/arrow/vector/util/DictionaryUtility.java#L44-L45

Users would expect the Java memory format (e.g., to create Vector or
VectorSchemaRoot from it directly). I don't think moving to the ipc format
would be a good idea either, the C data interface is quite different, e.g.,
should support import/export of individual vectors. However, the IPC code
is a good reference for learning how to handle dictionaries so I'll go over
it more carefully.

Reply via email to