Re: More questions about layout spec

Micah Kornfield Sat, 09 Apr 2016 21:23:15 -0700

Hi Kai,
Based on a previous thread on the mailing list
(http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c72a2bcfd-54d7-4376-8199-04a5535d8...@gmail.com%3E)
I believe the conclusion was there should be an optional reference
implementation for IPC so consumers of the memory format aren't
necessarily required to tie themselves to a particular technology,
component that they don't want to consume (e.g. some users of arrow
might just want the c++ objects and our not yet existent algorithms
component).  I think creating new document(s) to detail IPC concerns
makes sense instead of updating the existing document.

As you noted, Wes added the beginnings of an implementation for C++
that uses memory mapped files and
https://github.com/apache/arrow/blob/master/format/Message.fbs to
describe the schema.   I think the decision might have been made to
hold-off writing a concrete spec until we could verify a simple
use-case worked between java and C++.  One of the committers might
have a better view on this.  It probably pays to start writing up a
document based on the current implementation anyways so people have
broader visibility into future plans (and can provide feedback without
reading the C++ code).

Another mode of transport that deserves a reference
specification/implementation for is how tables can be transferred via
a socket (there is already a jira opened to create one via unix domain
sockets but this should likely be generalized to just be sockets).

I think we should open JIRAs to track writing reference specs for both
shared memory and socket based transport.

Thanks,
-Micah

On Sat, Apr 9, 2016 at 7:55 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
> Hi,
>
> Looking at the layout spec, I have some more questions to complement to the 
> previous ones discussed.
>
> About struct and union types:
>
> 1)      The order of the fields (how they are declared in order) seems to be 
> important, as it will affect how the data are laid out. For example, in union 
> type, how to organize and interpret field types, offsets and data arrays. 
> Similar to struct type.
>
> 2)      There is no saying about how their schema is represented and 
> how/where the schema is attached. Should the layout also contain the schema 
> info? In cpp codes, Table is implemented of columns and with self-contained 
> schema info.
>
> About schema:
> Nothing is mentioned about schema in the spec, no sure if it should be the 
> nature part of it. Without self-contained schema info, it won't be able to 
> interpret and process the layout data (like List, Struct, Table, Union and 
> etc.) across machines and languages.
>
> About non-goals:
> The follow are listed as non-goals for the document but in fact they're going 
> to be implemented. Should we remove them?
>
> 1.       To specify standardized metadata or a data layout for RPC or 
> transient file storage.
>
> 2.       Any "table" structure composed of named arrays each having their own 
> type or any other structure that composes arrays.
>
> Regards,
> Kai
>

Re: More questions about layout spec

Reply via email to