I'm not clear on why we need to introduce something beyond what flatbuffers
already provides. Can someone explain that to me? I'm not really a fan of
introducing a second representation of the same data (as I understand it).

On Thu, Sep 19, 2019 at 1:15 PM Wes McKinney <wesmck...@gmail.com> wrote:

> This is helpful, I will leave some comments on the proposal when I
> can, sometime in the next week.
>
> I agree that it would likely be opening a can of worms to create a
> semantic mapping between a generalized type grammar and Arrow's
> specific logical types defined in Schema.fbs. If we go down this
> route, we should probably utilize the simplest possible grammar that
> is capable of encoding the Type Flatbuffers union values.
>
> On Thu, Sep 19, 2019 at 2:49 PM Antoine Pitrou <solip...@pitrou.net>
> wrote:
> >
> >
> > I've posted a draft specification PR here, this should help orient the
> > discussion a bit:
> > https://github.com/apache/arrow/pull/5442
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> > On Wed, 18 Sep 2019 19:52:38 +0200
> > Antoine Pitrou <anto...@python.org> wrote:
> > > Hello,
> > >
> > > One thing that was discussed in the sync call is the ability to easily
> > > pass arrays at runtime between Arrow implementations or
> Arrow-supporting
> > > libraries in the same process, without bearing the cost of linking to
> > > e.g. the C++ Arrow library.
> > >
> > > (for example: "Duckdb wants to provide an option to return Arrow data
> of
> > > result sets, but they don't like having Arrow as a dependency")
> > >
> > > One possibility would be to define a C-level protocol similar in spirit
> > > to the Python buffer protocol, which some people may be familiar with
> (*).
> > >
> > > The basic idea is to define a simple C struct, which is ABI-stable and
> > > describes an Arrow away adequately.  The struct can be stack-allocated.
> > > Its definition can also be copied in another project (or interfaced
> with
> > > using a C FFI layer, depending on the language).
> > >
> > > There is no formal proposal, this message is meant to stir the
> discussion.
> > >
> > > Issues to work out:
> > >
> > > * Memory lifetime issues: where Python simply associates the Py_buffer
> > > with a PyObject owner (a garbage-collected Python object), we need
> > > another means to control lifetime of pointed areas.  One simple
> > > possibility is to include a destructor function pointer in the protocol
> > > struct.
> > >
> > > * Arrow type representation.  We probably need some kind of "format"
> > > mini-language to represent Arrow types, so that a type can be described
> > > using a `const char*`.  Ideally, primitives types at least should be
> > > trivially parsable.  We may take inspiration from Python here (`struct`
> > > module format characters, PEP 3118 format additions).
> > >
> > > Example C struct definition (not a formal proposal!):
> > >
> > > struct ArrowBuffer {
> > >   void* data;
> > >   int64_t nbytes;
> > >   // Called by the consumer when it doesn't need the buffer anymore
> > >   void (*release)(struct ArrowBuffer*);
> > >   // Opaque user data (for e.g. the release callback)
> > >   void* user_data;
> > > };
> > >
> > > struct ArrowArray {
> > >   // Type description
> > >   const char* format;
> > >   // Data description
> > >   int64_t length;
> > >   int64_t null_count;
> > >   int64_t n_buffers;
> > >   // Note: this pointers are probably owned by the ArrowArray struct
> > >   // and will be released and free()ed by the release callback.
> > >   struct BufferDescriptor* buffers;
> > >   struct ArrowDescriptor* dictionary;
> > >   // Called by the consumer when it doesn't need the array anymore
> > >   void (*release)(struct ArrowArrayDescriptor*);
> > >   // Opaque user data (for e.g. the release callback)
> > >   void* user_data;
> > > };
> > >
> > > Thoughts?
> > >
> > > (*) For the record, the reference for the Python buffer protocol:
> > > https://docs.python.org/3/c-api/buffer.html#buffer-structure
> > > and its C struct definition:
> > >
> https://github.com/python/cpython/blob/v3.7.4/Include/object.h#L181-L195
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> >
> >
> >
>

Reply via email to