[ 
https://issues.apache.org/jira/browse/ARROW-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860292#comment-15860292
 ] 

Emilio Lahr-Vivaz commented on ARROW-542:
-----------------------------------------

Another blocker I'm hitting is that I don't see any way that the type of a 
dictionary block can be determined during read. DictionaryEncoding has an 
indexType, but that seems to refer to the ints used to reference the dictionary 
values: 
https://github.com/apache/arrow/blob/b99d049c3d1894908b7e52774eb657675dc1f439/format/Message.fbs#L165
A dictionary encoded vector currently has it's type defined as the dictionary 
index type, but the type of the dictionary is not defined. It works when the 
data is in memory with the dictionary alongside it, but not when encoding to 
the file format... Possibly the dictionary encoded vector should specify the 
dictionary type? It seems like either that or the message format needs another 
field for the dictionary type.

> [Java] Implement dictionaries in stream/file encoding
> -----------------------------------------------------
>
>                 Key: ARROW-542
>                 URL: https://issues.apache.org/jira/browse/ARROW-542
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java - Vectors
>            Reporter: Emilio Lahr-Vivaz
>            Assignee: Emilio Lahr-Vivaz
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to