[ https://issues.apache.org/jira/browse/ARROW-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Le Dem reassigned ARROW-255: ----------------------------------- Assignee: Julien Le Dem > Finalize Dictionary representation > ---------------------------------- > > Key: ARROW-255 > URL: https://issues.apache.org/jira/browse/ARROW-255 > Project: Apache Arrow > Issue Type: Improvement > Components: Format > Reporter: Julien Le Dem > Assignee: Julien Le Dem > > format/Messages.fbs mentions DictionaryBatches with an id but does not > specify where they are referenced. > We should add a {{dictionary: long}} in Field that references the dictionary > id: > Field: > https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L86 > Dictionary id: > https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L165 > We need a spec in format/Layout.md that describes the dictionary layout. > When dictionary encoded the value vector is an array of signed int32 (for > consistency with variable length collection offsets). > The dictionary vector is a Vector of the type of the value. indexed by their > id in the dictionary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)