I'm not clear on why we need to introduce something beyond what flatbuffers already provides. Can someone explain that to me? I'm not really a fan of introducing a second representation of the same data (as I understand it).
On Thu, Sep 19, 2019 at 1:15 PM Wes McKinney <wesmck...@gmail.com> wrote: > This is helpful, I will leave some comments on the proposal when I > can, sometime in the next week. > > I agree that it would likely be opening a can of worms to create a > semantic mapping between a generalized type grammar and Arrow's > specific logical types defined in Schema.fbs. If we go down this > route, we should probably utilize the simplest possible grammar that > is capable of encoding the Type Flatbuffers union values. > > On Thu, Sep 19, 2019 at 2:49 PM Antoine Pitrou <solip...@pitrou.net> > wrote: > > > > > > I've posted a draft specification PR here, this should help orient the > > discussion a bit: > > https://github.com/apache/arrow/pull/5442 > > > > Regards > > > > Antoine. > > > > > > > > On Wed, 18 Sep 2019 19:52:38 +0200 > > Antoine Pitrou <anto...@python.org> wrote: > > > Hello, > > > > > > One thing that was discussed in the sync call is the ability to easily > > > pass arrays at runtime between Arrow implementations or > Arrow-supporting > > > libraries in the same process, without bearing the cost of linking to > > > e.g. the C++ Arrow library. > > > > > > (for example: "Duckdb wants to provide an option to return Arrow data > of > > > result sets, but they don't like having Arrow as a dependency") > > > > > > One possibility would be to define a C-level protocol similar in spirit > > > to the Python buffer protocol, which some people may be familiar with > (*). > > > > > > The basic idea is to define a simple C struct, which is ABI-stable and > > > describes an Arrow away adequately. The struct can be stack-allocated. > > > Its definition can also be copied in another project (or interfaced > with > > > using a C FFI layer, depending on the language). > > > > > > There is no formal proposal, this message is meant to stir the > discussion. > > > > > > Issues to work out: > > > > > > * Memory lifetime issues: where Python simply associates the Py_buffer > > > with a PyObject owner (a garbage-collected Python object), we need > > > another means to control lifetime of pointed areas. One simple > > > possibility is to include a destructor function pointer in the protocol > > > struct. > > > > > > * Arrow type representation. We probably need some kind of "format" > > > mini-language to represent Arrow types, so that a type can be described > > > using a `const char*`. Ideally, primitives types at least should be > > > trivially parsable. We may take inspiration from Python here (`struct` > > > module format characters, PEP 3118 format additions). > > > > > > Example C struct definition (not a formal proposal!): > > > > > > struct ArrowBuffer { > > > void* data; > > > int64_t nbytes; > > > // Called by the consumer when it doesn't need the buffer anymore > > > void (*release)(struct ArrowBuffer*); > > > // Opaque user data (for e.g. the release callback) > > > void* user_data; > > > }; > > > > > > struct ArrowArray { > > > // Type description > > > const char* format; > > > // Data description > > > int64_t length; > > > int64_t null_count; > > > int64_t n_buffers; > > > // Note: this pointers are probably owned by the ArrowArray struct > > > // and will be released and free()ed by the release callback. > > > struct BufferDescriptor* buffers; > > > struct ArrowDescriptor* dictionary; > > > // Called by the consumer when it doesn't need the array anymore > > > void (*release)(struct ArrowArrayDescriptor*); > > > // Opaque user data (for e.g. the release callback) > > > void* user_data; > > > }; > > > > > > Thoughts? > > > > > > (*) For the record, the reference for the Python buffer protocol: > > > https://docs.python.org/3/c-api/buffer.html#buffer-structure > > > and its C struct definition: > > > > https://github.com/python/cpython/blob/v3.7.4/Include/object.h#L181-L195 > > > > > > Regards > > > > > > Antoine. > > > > > > > > > >