Hi Kai, Based on a previous thread on the mailing list (http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c72a2bcfd-54d7-4376-8199-04a5535d8...@gmail.com%3E) I believe the conclusion was there should be an optional reference implementation for IPC so consumers of the memory format aren't necessarily required to tie themselves to a particular technology, component that they don't want to consume (e.g. some users of arrow might just want the c++ objects and our not yet existent algorithms component). I think creating new document(s) to detail IPC concerns makes sense instead of updating the existing document.
As you noted, Wes added the beginnings of an implementation for C++ that uses memory mapped files and https://github.com/apache/arrow/blob/master/format/Message.fbs to describe the schema. I think the decision might have been made to hold-off writing a concrete spec until we could verify a simple use-case worked between java and C++. One of the committers might have a better view on this. It probably pays to start writing up a document based on the current implementation anyways so people have broader visibility into future plans (and can provide feedback without reading the C++ code). Another mode of transport that deserves a reference specification/implementation for is how tables can be transferred via a socket (there is already a jira opened to create one via unix domain sockets but this should likely be generalized to just be sockets). I think we should open JIRAs to track writing reference specs for both shared memory and socket based transport. Thanks, -Micah On Sat, Apr 9, 2016 at 7:55 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > Hi, > > Looking at the layout spec, I have some more questions to complement to the > previous ones discussed. > > About struct and union types: > > 1) The order of the fields (how they are declared in order) seems to be > important, as it will affect how the data are laid out. For example, in union > type, how to organize and interpret field types, offsets and data arrays. > Similar to struct type. > > 2) There is no saying about how their schema is represented and > how/where the schema is attached. Should the layout also contain the schema > info? In cpp codes, Table is implemented of columns and with self-contained > schema info. > > About schema: > Nothing is mentioned about schema in the spec, no sure if it should be the > nature part of it. Without self-contained schema info, it won't be able to > interpret and process the layout data (like List, Struct, Table, Union and > etc.) across machines and languages. > > About non-goals: > The follow are listed as non-goals for the document but in fact they're going > to be implemented. Should we remove them? > > 1. To specify standardized metadata or a data layout for RPC or > transient file storage. > > 2. Any "table" structure composed of named arrays each having their own > type or any other structure that composes arrays. > > Regards, > Kai >