Re: More questions about layout spec

Wes McKinney Fri, 22 Apr 2016 12:30:01 -0700

On Sun, Apr 10, 2016 at 9:47 AM, Zheng, Kai <kai.zh...@intel.com> wrote:
> Thanks Micah for the answers. It looks like a good plan, the IPC things to be 
> documented separately, and the schema to be complemented to the spec I guess.
>
> Regarding user cases between Java and c++, I thought there may be an 
> important one, in a framework how Java layer accesses data or objects in 
> native(c++) layer. It did be discussed quite some time before, talking about 
> how JNI may be better than pure Java. Sounds like, native/c++ would leave 
> much more space for SIMD things so, Java layer would just instrument 
> native/c++ layer to load/access data in the format somewhere, perform the 
> desired computing and then retrieve computed results to respond to end users. 
> I'm particularly interested in this case and wonder if any plan or thoughts 
> about this. Java to c++ may be very similar to python to c++, though I 
> haven't looked into the python part yet.
>
> Had done a quick look at the Java part, it looks like there is little attempt 
> to sync or unify between Java and c++ in API level, though common to the same 
> binary representation. This would complicate the implementing of the use case 
> I mentioned above, and cause confusing for developers when switch from one 
> (like c++) to the other (say Java). At least, the high level constructs 
> should be of the same name, better conforming to the spec. The Java parts 
> look like inheriting the styles from Apache Drill I guess.
>


In general I don't think it's worth spending significant energy trying
to conform the user APIs between the Java / C++ (or any other future)
implementations. When possible, it's nice to do. Hopefully we'll look
back in a few years and view the C++ "clean room" implementation from
the spec as a useful exercise.

cheers
Wes

> Just some quick thoughts by the way, might be better to discuss separately.
>
> Regards,
> Kai
>
> -----Original Message-----
> From: Micah Kornfield [mailto:emkornfi...@gmail.com]
> Sent: Sunday, April 10, 2016 12:23 PM
> To: dev@arrow.apache.org
> Subject: Re: More questions about layout spec
>
> Hi Kai,
> Based on a previous thread on the mailing list
> (http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c72a2bcfd-54d7-4376-8199-04a5535d8...@gmail.com%3E)
> I believe the conclusion was there should be an optional reference 
> implementation for IPC so consumers of the memory format aren't necessarily 
> required to tie themselves to a particular technology, component that they 
> don't want to consume (e.g. some users of arrow might just want the c++ 
> objects and our not yet existent algorithms component).  I think creating new 
> document(s) to detail IPC concerns makes sense instead of updating the 
> existing document.
>
> As you noted, Wes added the beginnings of an implementation for C++ that uses 
> memory mapped files and 
> https://github.com/apache/arrow/blob/master/format/Message.fbs to
> describe the schema.   I think the decision might have been made to
> hold-off writing a concrete spec until we could verify a simple use-case 
> worked between java and C++.  One of the committers might have a better view 
> on this.  It probably pays to start writing up a document based on the 
> current implementation anyways so people have broader visibility into future 
> plans (and can provide feedback without reading the C++ code).
>
> Another mode of transport that deserves a reference 
> specification/implementation for is how tables can be transferred via a 
> socket (there is already a jira opened to create one via unix domain sockets 
> but this should likely be generalized to just be sockets).
>
> I think we should open JIRAs to track writing reference specs for both shared 
> memory and socket based transport.
>
> Thanks,
> -Micah
>
> On Sat, Apr 9, 2016 at 7:55 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>> Hi,
>>
>> Looking at the layout spec, I have some more questions to complement to the 
>> previous ones discussed.
>>
>> About struct and union types:
>>
>> 1)      The order of the fields (how they are declared in order) seems to be 
>> important, as it will affect how the data are laid out. For example, in 
>> union type, how to organize and interpret field types, offsets and data 
>> arrays. Similar to struct type.
>>
>> 2)      There is no saying about how their schema is represented and 
>> how/where the schema is attached. Should the layout also contain the schema 
>> info? In cpp codes, Table is implemented of columns and with self-contained 
>> schema info.
>>
>> About schema:
>> Nothing is mentioned about schema in the spec, no sure if it should be the 
>> nature part of it. Without self-contained schema info, it won't be able to 
>> interpret and process the layout data (like List, Struct, Table, Union and 
>> etc.) across machines and languages.
>>
>> About non-goals:
>> The follow are listed as non-goals for the document but in fact they're 
>> going to be implemented. Should we remove them?
>>
>> 1.       To specify standardized metadata or a data layout for RPC or 
>> transient file storage.
>>
>> 2.       Any "table" structure composed of named arrays each having their 
>> own type or any other structure that composes arrays.
>>
>> Regards,
>> Kai
>>

Re: More questions about layout spec

Reply via email to