Wes McKinney created ARROW-262: ---------------------------------- Summary: [Format] Add a new format document for metadata and logical types for messaging and IPC / on-wire/file representations Key: ARROW-262 URL: https://issues.apache.org/jira/browse/ARROW-262 Project: Apache Arrow Issue Type: New Feature Components: Format Reporter: Wes McKinney Assignee: Wes McKinney
The existing document https://github.com/apache/arrow/blob/master/format/Layout.md Only describes the physical layout of fixed-size, variable-size, and other nested types (struct, union) Meanwhile, we have begun drafting Flatbuffers IDL for Arrow metadata: https://github.com/apache/arrow/blob/master/format/Message.fbs I will add a document that will, to begin with: * Explain the mapping between logical types in the metadata. For example, definitions of important data types: integers, floating point, boolean, string (UTF-8) and binary * Where relevant, describing how each logical type's physical memory is converted to metadata for messaging purposes (e.g. the {{RecordBatch}} concept in the IDL) We have already begun prototype implementations in the C++ codebase (https://github.com/apache/arrow/tree/master/cpp/src/arrow/ipc) so this will serve as implementation-agnostic documentation. Subsequently, I will make a follow up patch for discussion to hopefully address metadata shortfall between the canonical Arrow metadata and the similar metadata used by the bespoke Feather format (https://github.com/wesm/feather/blob/master/cpp/src/feather/metadata.fbs) -- This message was sent by Atlassian JIRA (v6.3.4#6332)