[ https://issues.apache.org/jira/browse/ARROW-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860292#comment-15860292 ]
Emilio Lahr-Vivaz commented on ARROW-542: ----------------------------------------- Another blocker I'm hitting is that I don't see any way that the type of a dictionary block can be determined during read. DictionaryEncoding has an indexType, but that seems to refer to the ints used to reference the dictionary values: https://github.com/apache/arrow/blob/b99d049c3d1894908b7e52774eb657675dc1f439/format/Message.fbs#L165 A dictionary encoded vector currently has it's type defined as the dictionary index type, but the type of the dictionary is not defined. It works when the data is in memory with the dictionary alongside it, but not when encoding to the file format... Possibly the dictionary encoded vector should specify the dictionary type? It seems like either that or the message format needs another field for the dictionary type. > [Java] Implement dictionaries in stream/file encoding > ----------------------------------------------------- > > Key: ARROW-542 > URL: https://issues.apache.org/jira/browse/ARROW-542 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors > Reporter: Emilio Lahr-Vivaz > Assignee: Emilio Lahr-Vivaz > -- This message was sent by Atlassian JIRA (v6.3.15#6346)