Just to add my two cents:

The Arrow specification and Flatbuffers files defines a _binary
protocol_ for making data available at the contiguous record batch
level either in-process or via some other address space (a memory
mapped file, a socket payload / RPC message).

Chunked arrays and tables are semantic constructs and don't really
have much to do with the binary protocol. It has turned out to be a
convenient programming construct, so I don't necessarily think it's a
bad idea for e.g. Go, Rust, or JavaScript to copy these ideas. There
is no requirement to do this, though; these were just some ideas I had
about how to make working with in-memory datasets consisting of
multiple record batches a bit nicer. There may be some other
interfaces or abstractions created in the future in one of the other
languages that we could adopt later in C++.

BTW, what we do in C++ if we have an arrow::Table whose columns have
different chunking layouts is split up the table into a sequence of
regularized record batches (see [1]); these could then be put on the
wire (e.g. using Flight / gRPC) or written to a shared memory segment
using the IPC stream or file protocol

- Wes

[1]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h#L302

On Sun, Jan 27, 2019 at 9:46 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Hi Neville,
>
> Le 27/01/2019 à 13:07, Neville Dipale a écrit :
> > Hi Antoine,
> >
> > I've given your response some thought.
> >
> > I'm thinking more looking at the computational aspect of Arrow. I agree
> > that for representing and sharing data, RecordBatches achieve the purpose.
> >
> > I came across ChunkedArray, Column and Table while I was trying to create a
> > dataframe library in Rust. The other languages already benefit from these 3
> > already implemented, but for Rust I've had to try create them myself.
> > This is what led me to asking the question, because the various languages
> > that I've seen so far, seem to follow the same kind of standard re. both
> > the structure and methods to create/interact with chunked arrays, columns,
> > and tables.
>
> What happened is probably that most non-C++ implementations took
> inspiration from the C++ implementation ;-)
>
> Arrow does not aim at standardizing APIs, just data structures.
> Personally (i.e. I do not claim to represent the views of the project
> here), it seems to me that standardizing APIs leads to suboptimal and
> cumbersome "largest common denominator" interfaces such as the DOM APIs
> for XML.
>
> Regards
>
> Antoine.

Reply via email to