This is helpful, I will leave some comments on the proposal when I
can, sometime in the next week.

I agree that it would likely be opening a can of worms to create a
semantic mapping between a generalized type grammar and Arrow's
specific logical types defined in Schema.fbs. If we go down this
route, we should probably utilize the simplest possible grammar that
is capable of encoding the Type Flatbuffers union values.

On Thu, Sep 19, 2019 at 2:49 PM Antoine Pitrou <solip...@pitrou.net> wrote:
>
>
> I've posted a draft specification PR here, this should help orient the
> discussion a bit:
> https://github.com/apache/arrow/pull/5442
>
> Regards
>
> Antoine.
>
>
>
> On Wed, 18 Sep 2019 19:52:38 +0200
> Antoine Pitrou <anto...@python.org> wrote:
> > Hello,
> >
> > One thing that was discussed in the sync call is the ability to easily
> > pass arrays at runtime between Arrow implementations or Arrow-supporting
> > libraries in the same process, without bearing the cost of linking to
> > e.g. the C++ Arrow library.
> >
> > (for example: "Duckdb wants to provide an option to return Arrow data of
> > result sets, but they don't like having Arrow as a dependency")
> >
> > One possibility would be to define a C-level protocol similar in spirit
> > to the Python buffer protocol, which some people may be familiar with (*).
> >
> > The basic idea is to define a simple C struct, which is ABI-stable and
> > describes an Arrow away adequately.  The struct can be stack-allocated.
> > Its definition can also be copied in another project (or interfaced with
> > using a C FFI layer, depending on the language).
> >
> > There is no formal proposal, this message is meant to stir the discussion.
> >
> > Issues to work out:
> >
> > * Memory lifetime issues: where Python simply associates the Py_buffer
> > with a PyObject owner (a garbage-collected Python object), we need
> > another means to control lifetime of pointed areas.  One simple
> > possibility is to include a destructor function pointer in the protocol
> > struct.
> >
> > * Arrow type representation.  We probably need some kind of "format"
> > mini-language to represent Arrow types, so that a type can be described
> > using a `const char*`.  Ideally, primitives types at least should be
> > trivially parsable.  We may take inspiration from Python here (`struct`
> > module format characters, PEP 3118 format additions).
> >
> > Example C struct definition (not a formal proposal!):
> >
> > struct ArrowBuffer {
> >   void* data;
> >   int64_t nbytes;
> >   // Called by the consumer when it doesn't need the buffer anymore
> >   void (*release)(struct ArrowBuffer*);
> >   // Opaque user data (for e.g. the release callback)
> >   void* user_data;
> > };
> >
> > struct ArrowArray {
> >   // Type description
> >   const char* format;
> >   // Data description
> >   int64_t length;
> >   int64_t null_count;
> >   int64_t n_buffers;
> >   // Note: this pointers are probably owned by the ArrowArray struct
> >   // and will be released and free()ed by the release callback.
> >   struct BufferDescriptor* buffers;
> >   struct ArrowDescriptor* dictionary;
> >   // Called by the consumer when it doesn't need the array anymore
> >   void (*release)(struct ArrowArrayDescriptor*);
> >   // Opaque user data (for e.g. the release callback)
> >   void* user_data;
> > };
> >
> > Thoughts?
> >
> > (*) For the record, the reference for the Python buffer protocol:
> > https://docs.python.org/3/c-api/buffer.html#buffer-structure
> > and its C struct definition:
> > https://github.com/python/cpython/blob/v3.7.4/Include/object.h#L181-L195
> >
> > Regards
> >
> > Antoine.
> >
>
>
>

Reply via email to