No, the plan for this proposal is to avoid providing a C API.  Each
Arrow implementation could produce and consume the C data protocol, for
example the C++ Array class could add these methods:

class Array {
  // ...

 public:
  // Export array to the C data protocol
  void Share(ArrowArray* out);
  // Import a C data protocol array
  static Status FromShared(ArrowArray* input,
                           std::shared_ptr<Array>* out);
};

Also, I don't know why a C API exposed by the C++ library would solve
your problem.  You would still have a problem with bundling the .so,
symbol conflicts if several libraries load libarrow.so, etc.

Regards

Antoine.


Le 19/09/2019 à 18:21, Zhuo Peng a écrit :
> Hi Antoine,
> 
> I'm also interested in a stable ABI (previously I posted on this mailing
> list about the ABI issues I had [1]). Does having such an ABI-stable
> C-struct imply that there will be a set of C-APIs exposed by the Arrow
> (C++) library (which I think would lead to a solution to all the inherit
> ABI issues caused by C++)?
> 
> [1]
> https://lists.apache.org/thread.html/27b6e2a30cf93c5f5f78de970c68c7d996f538d94ab61431fa342f41@%3Cdev.arrow.apache.org%3E
> 
> On Thu, Sep 19, 2019 at 1:07 AM Antoine Pitrou <anto...@python.org> wrote:
> 
>>
>> Le 19/09/2019 à 09:39, Micah Kornfield a écrit :
>>> I like the idea of a stable ABI for in-processing  that can be used for
>> in
>>> process communication.  For instance, there was a recent question on
>>> stack-overflow on how to solve this [1].
>>>
>>> A couple of thoughts/questions:
>>> * Would ArrowArray also need a self reference for children arrays?
>>
>> Yes, I forgot that.  I also think we don't need a separate Buffer
>> struct, instead the Array struct should own all its buffers.
>>
>>> * Should transferring key-value metadata be in scope?
>>
>> Yes.  It could either be in the format string or a separate string.  The
>> upside of a separate string is that a consumer may ignore it trivially
>> if it doesn't need the information.
>>
>> Another open question is for nested types: does the format string
>> represent the entire type including children?  Or must child types be
>> read in the child arrays?  If we mimick ArrayData, then the format
>> string should represent the entire type; it will then be more complex to
>> parse.
>>
>> We should also make sure that extension types fit in the protocol.
>>
>>> * Should the API more closely align the IPC spec (pass a schema
>> separately
>>> and list of buffers instead of individual arrays)?
>>
>> Then you have that's not immediately usable (you have to do some
>> processing to reconstitute the individual arrays).  One goal here is to
>> minimize implementation costs for producers and consumers.  The
>> assumption is a data model similar to the C++ ArrowData model; do we
>> have implementations that use an entirely different model?  Perhaps I
>> should take a look :-)
>>
>> Note that the draft I posted only concerns arrays.  We may also want to
>> have a C struct for batches or tables.
>>
>> Regards
>>
>> Antoine.
>>
>>
>>>
>>> Thanks,
>>> Micah
>>>
>>> [1]
>>>
>> https://stackoverflow.com/questions/57966032/how-does-apache-arrow-facilitate-no-overhead-for-cross-system-communication/57967220#57967220
>>>
>>> On Wed, Sep 18, 2019 at 10:52 AM Antoine Pitrou <anto...@python.org>
>> wrote:
>>>
>>>>
>>>> Hello,
>>>>
>>>> One thing that was discussed in the sync call is the ability to easily
>>>> pass arrays at runtime between Arrow implementations or Arrow-supporting
>>>> libraries in the same process, without bearing the cost of linking to
>>>> e.g. the C++ Arrow library.
>>>>
>>>> (for example: "Duckdb wants to provide an option to return Arrow data of
>>>> result sets, but they don't like having Arrow as a dependency")
>>>>
>>>> One possibility would be to define a C-level protocol similar in spirit
>>>> to the Python buffer protocol, which some people may be familiar with
>> (*).
>>>>
>>>> The basic idea is to define a simple C struct, which is ABI-stable and
>>>> describes an Arrow away adequately.  The struct can be stack-allocated.
>>>> Its definition can also be copied in another project (or interfaced with
>>>> using a C FFI layer, depending on the language).
>>>>
>>>> There is no formal proposal, this message is meant to stir the
>> discussion.
>>>>
>>>> Issues to work out:
>>>>
>>>> * Memory lifetime issues: where Python simply associates the Py_buffer
>>>> with a PyObject owner (a garbage-collected Python object), we need
>>>> another means to control lifetime of pointed areas.  One simple
>>>> possibility is to include a destructor function pointer in the protocol
>>>> struct.
>>>>
>>>> * Arrow type representation.  We probably need some kind of "format"
>>>> mini-language to represent Arrow types, so that a type can be described
>>>> using a `const char*`.  Ideally, primitives types at least should be
>>>> trivially parsable.  We may take inspiration from Python here (`struct`
>>>> module format characters, PEP 3118 format additions).
>>>>
>>>> Example C struct definition (not a formal proposal!):
>>>>
>>>> struct ArrowBuffer {
>>>>   void* data;
>>>>   int64_t nbytes;
>>>>   // Called by the consumer when it doesn't need the buffer anymore
>>>>   void (*release)(struct ArrowBuffer*);
>>>>   // Opaque user data (for e.g. the release callback)
>>>>   void* user_data;
>>>> };
>>>>
>>>> struct ArrowArray {
>>>>   // Type description
>>>>   const char* format;
>>>>   // Data description
>>>>   int64_t length;
>>>>   int64_t null_count;
>>>>   int64_t n_buffers;
>>>>   // Note: this pointers are probably owned by the ArrowArray struct
>>>>   // and will be released and free()ed by the release callback.
>>>>   struct BufferDescriptor* buffers;
>>>>   struct ArrowDescriptor* dictionary;
>>>>   // Called by the consumer when it doesn't need the array anymore
>>>>   void (*release)(struct ArrowArrayDescriptor*);
>>>>   // Opaque user data (for e.g. the release callback)
>>>>   void* user_data;
>>>> };
>>>>
>>>> Thoughts?
>>>>
>>>> (*) For the record, the reference for the Python buffer protocol:
>>>> https://docs.python.org/3/c-api/buffer.html#buffer-structure
>>>> and its C struct definition:
>>>>
>> https://github.com/python/cpython/blob/v3.7.4/Include/object.h#L181-L195
>>>>
>>>> Regards
>>>>
>>>> Antoine.
>>>>
>>>
>>
> 

Reply via email to