> It seems that we have both raw value and encoded value types in the Java implementation, so there is no information loss?
I think that in the Java memory format they are both the index type, see https://github.com/apache/arrow/blob/5003278ded77f1ab385425143aafd085fda1f701/java/vector/src/main/java/org/apache/arrow/vector/util/DictionaryUtility.java#L44-L45 Users would expect the Java memory format (e.g., to create Vector or VectorSchemaRoot from it directly). I don't think moving to the ipc format would be a good idea either, the C data interface is quite different, e.g., should support import/export of individual vectors. However, the IPC code is a good reference for learning how to handle dictionaries so I'll go over it more carefully.