Wes McKinney created ARROW-262:
----------------------------------

             Summary: [Format] Add a new format document for metadata and 
logical types for messaging and IPC / on-wire/file representations
                 Key: ARROW-262
                 URL: https://issues.apache.org/jira/browse/ARROW-262
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Format
            Reporter: Wes McKinney
            Assignee: Wes McKinney


The existing document

https://github.com/apache/arrow/blob/master/format/Layout.md

Only describes the physical layout of fixed-size, variable-size, and other 
nested types (struct, union)

Meanwhile, we have begun drafting Flatbuffers IDL for Arrow metadata:

https://github.com/apache/arrow/blob/master/format/Message.fbs

I will add a document that will, to begin with:

* Explain the mapping between logical types in the metadata. For example,  
definitions of important data types: integers, floating point, boolean, string 
(UTF-8) and binary

* Where relevant, describing how each logical type's physical memory is 
converted to metadata for messaging purposes (e.g. the {{RecordBatch}} concept 
in the IDL)

We have already begun prototype implementations in the C++ codebase 
(https://github.com/apache/arrow/tree/master/cpp/src/arrow/ipc) so this will 
serve as implementation-agnostic documentation.

Subsequently, I will make a follow up patch for discussion to hopefully address 
metadata shortfall between the canonical Arrow metadata and the similar 
metadata used by the bespoke Feather format 
(https://github.com/wesm/feather/blob/master/cpp/src/feather/metadata.fbs)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to