Just to add my two cents: The Arrow specification and Flatbuffers files defines a _binary protocol_ for making data available at the contiguous record batch level either in-process or via some other address space (a memory mapped file, a socket payload / RPC message).
Chunked arrays and tables are semantic constructs and don't really have much to do with the binary protocol. It has turned out to be a convenient programming construct, so I don't necessarily think it's a bad idea for e.g. Go, Rust, or JavaScript to copy these ideas. There is no requirement to do this, though; these were just some ideas I had about how to make working with in-memory datasets consisting of multiple record batches a bit nicer. There may be some other interfaces or abstractions created in the future in one of the other languages that we could adopt later in C++. BTW, what we do in C++ if we have an arrow::Table whose columns have different chunking layouts is split up the table into a sequence of regularized record batches (see [1]); these could then be put on the wire (e.g. using Flight / gRPC) or written to a shared memory segment using the IPC stream or file protocol - Wes [1]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h#L302 On Sun, Jan 27, 2019 at 9:46 AM Antoine Pitrou <anto...@python.org> wrote: > > > Hi Neville, > > Le 27/01/2019 à 13:07, Neville Dipale a écrit : > > Hi Antoine, > > > > I've given your response some thought. > > > > I'm thinking more looking at the computational aspect of Arrow. I agree > > that for representing and sharing data, RecordBatches achieve the purpose. > > > > I came across ChunkedArray, Column and Table while I was trying to create a > > dataframe library in Rust. The other languages already benefit from these 3 > > already implemented, but for Rust I've had to try create them myself. > > This is what led me to asking the question, because the various languages > > that I've seen so far, seem to follow the same kind of standard re. both > > the structure and methods to create/interact with chunked arrays, columns, > > and tables. > > What happened is probably that most non-C++ implementations took > inspiration from the C++ implementation ;-) > > Arrow does not aim at standardizing APIs, just data structures. > Personally (i.e. I do not claim to represent the views of the project > here), it seems to me that standardizing APIs leads to suboptimal and > cumbersome "largest common denominator" interfaces such as the DOM APIs > for XML. > > Regards > > Antoine.