On Sun, Apr 10, 2016 at 9:47 AM, Zheng, Kai <kai.zh...@intel.com> wrote: > Thanks Micah for the answers. It looks like a good plan, the IPC things to be > documented separately, and the schema to be complemented to the spec I guess. > > Regarding user cases between Java and c++, I thought there may be an > important one, in a framework how Java layer accesses data or objects in > native(c++) layer. It did be discussed quite some time before, talking about > how JNI may be better than pure Java. Sounds like, native/c++ would leave > much more space for SIMD things so, Java layer would just instrument > native/c++ layer to load/access data in the format somewhere, perform the > desired computing and then retrieve computed results to respond to end users. > I'm particularly interested in this case and wonder if any plan or thoughts > about this. Java to c++ may be very similar to python to c++, though I > haven't looked into the python part yet. > > Had done a quick look at the Java part, it looks like there is little attempt > to sync or unify between Java and c++ in API level, though common to the same > binary representation. This would complicate the implementing of the use case > I mentioned above, and cause confusing for developers when switch from one > (like c++) to the other (say Java). At least, the high level constructs > should be of the same name, better conforming to the spec. The Java parts > look like inheriting the styles from Apache Drill I guess. >
In general I don't think it's worth spending significant energy trying to conform the user APIs between the Java / C++ (or any other future) implementations. When possible, it's nice to do. Hopefully we'll look back in a few years and view the C++ "clean room" implementation from the spec as a useful exercise. cheers Wes > Just some quick thoughts by the way, might be better to discuss separately. > > Regards, > Kai > > -----Original Message----- > From: Micah Kornfield [mailto:emkornfi...@gmail.com] > Sent: Sunday, April 10, 2016 12:23 PM > To: dev@arrow.apache.org > Subject: Re: More questions about layout spec > > Hi Kai, > Based on a previous thread on the mailing list > (http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c72a2bcfd-54d7-4376-8199-04a5535d8...@gmail.com%3E) > I believe the conclusion was there should be an optional reference > implementation for IPC so consumers of the memory format aren't necessarily > required to tie themselves to a particular technology, component that they > don't want to consume (e.g. some users of arrow might just want the c++ > objects and our not yet existent algorithms component). I think creating new > document(s) to detail IPC concerns makes sense instead of updating the > existing document. > > As you noted, Wes added the beginnings of an implementation for C++ that uses > memory mapped files and > https://github.com/apache/arrow/blob/master/format/Message.fbs to > describe the schema. I think the decision might have been made to > hold-off writing a concrete spec until we could verify a simple use-case > worked between java and C++. One of the committers might have a better view > on this. It probably pays to start writing up a document based on the > current implementation anyways so people have broader visibility into future > plans (and can provide feedback without reading the C++ code). > > Another mode of transport that deserves a reference > specification/implementation for is how tables can be transferred via a > socket (there is already a jira opened to create one via unix domain sockets > but this should likely be generalized to just be sockets). > > I think we should open JIRAs to track writing reference specs for both shared > memory and socket based transport. > > Thanks, > -Micah > > On Sat, Apr 9, 2016 at 7:55 PM, Zheng, Kai <kai.zh...@intel.com> wrote: >> Hi, >> >> Looking at the layout spec, I have some more questions to complement to the >> previous ones discussed. >> >> About struct and union types: >> >> 1) The order of the fields (how they are declared in order) seems to be >> important, as it will affect how the data are laid out. For example, in >> union type, how to organize and interpret field types, offsets and data >> arrays. Similar to struct type. >> >> 2) There is no saying about how their schema is represented and >> how/where the schema is attached. Should the layout also contain the schema >> info? In cpp codes, Table is implemented of columns and with self-contained >> schema info. >> >> About schema: >> Nothing is mentioned about schema in the spec, no sure if it should be the >> nature part of it. Without self-contained schema info, it won't be able to >> interpret and process the layout data (like List, Struct, Table, Union and >> etc.) across machines and languages. >> >> About non-goals: >> The follow are listed as non-goals for the document but in fact they're >> going to be implemented. Should we remove them? >> >> 1. To specify standardized metadata or a data layout for RPC or >> transient file storage. >> >> 2. Any "table" structure composed of named arrays each having their >> own type or any other structure that composes arrays. >> >> Regards, >> Kai >>