No, the plan for this proposal is to avoid providing a C API. Each Arrow implementation could produce and consume the C data protocol, for example the C++ Array class could add these methods:
class Array { // ... public: // Export array to the C data protocol void Share(ArrowArray* out); // Import a C data protocol array static Status FromShared(ArrowArray* input, std::shared_ptr<Array>* out); }; Also, I don't know why a C API exposed by the C++ library would solve your problem. You would still have a problem with bundling the .so, symbol conflicts if several libraries load libarrow.so, etc. Regards Antoine. Le 19/09/2019 à 18:21, Zhuo Peng a écrit : > Hi Antoine, > > I'm also interested in a stable ABI (previously I posted on this mailing > list about the ABI issues I had [1]). Does having such an ABI-stable > C-struct imply that there will be a set of C-APIs exposed by the Arrow > (C++) library (which I think would lead to a solution to all the inherit > ABI issues caused by C++)? > > [1] > https://lists.apache.org/thread.html/27b6e2a30cf93c5f5f78de970c68c7d996f538d94ab61431fa342f41@%3Cdev.arrow.apache.org%3E > > On Thu, Sep 19, 2019 at 1:07 AM Antoine Pitrou <anto...@python.org> wrote: > >> >> Le 19/09/2019 à 09:39, Micah Kornfield a écrit : >>> I like the idea of a stable ABI for in-processing that can be used for >> in >>> process communication. For instance, there was a recent question on >>> stack-overflow on how to solve this [1]. >>> >>> A couple of thoughts/questions: >>> * Would ArrowArray also need a self reference for children arrays? >> >> Yes, I forgot that. I also think we don't need a separate Buffer >> struct, instead the Array struct should own all its buffers. >> >>> * Should transferring key-value metadata be in scope? >> >> Yes. It could either be in the format string or a separate string. The >> upside of a separate string is that a consumer may ignore it trivially >> if it doesn't need the information. >> >> Another open question is for nested types: does the format string >> represent the entire type including children? Or must child types be >> read in the child arrays? If we mimick ArrayData, then the format >> string should represent the entire type; it will then be more complex to >> parse. >> >> We should also make sure that extension types fit in the protocol. >> >>> * Should the API more closely align the IPC spec (pass a schema >> separately >>> and list of buffers instead of individual arrays)? >> >> Then you have that's not immediately usable (you have to do some >> processing to reconstitute the individual arrays). One goal here is to >> minimize implementation costs for producers and consumers. The >> assumption is a data model similar to the C++ ArrowData model; do we >> have implementations that use an entirely different model? Perhaps I >> should take a look :-) >> >> Note that the draft I posted only concerns arrays. We may also want to >> have a C struct for batches or tables. >> >> Regards >> >> Antoine. >> >> >>> >>> Thanks, >>> Micah >>> >>> [1] >>> >> https://stackoverflow.com/questions/57966032/how-does-apache-arrow-facilitate-no-overhead-for-cross-system-communication/57967220#57967220 >>> >>> On Wed, Sep 18, 2019 at 10:52 AM Antoine Pitrou <anto...@python.org> >> wrote: >>> >>>> >>>> Hello, >>>> >>>> One thing that was discussed in the sync call is the ability to easily >>>> pass arrays at runtime between Arrow implementations or Arrow-supporting >>>> libraries in the same process, without bearing the cost of linking to >>>> e.g. the C++ Arrow library. >>>> >>>> (for example: "Duckdb wants to provide an option to return Arrow data of >>>> result sets, but they don't like having Arrow as a dependency") >>>> >>>> One possibility would be to define a C-level protocol similar in spirit >>>> to the Python buffer protocol, which some people may be familiar with >> (*). >>>> >>>> The basic idea is to define a simple C struct, which is ABI-stable and >>>> describes an Arrow away adequately. The struct can be stack-allocated. >>>> Its definition can also be copied in another project (or interfaced with >>>> using a C FFI layer, depending on the language). >>>> >>>> There is no formal proposal, this message is meant to stir the >> discussion. >>>> >>>> Issues to work out: >>>> >>>> * Memory lifetime issues: where Python simply associates the Py_buffer >>>> with a PyObject owner (a garbage-collected Python object), we need >>>> another means to control lifetime of pointed areas. One simple >>>> possibility is to include a destructor function pointer in the protocol >>>> struct. >>>> >>>> * Arrow type representation. We probably need some kind of "format" >>>> mini-language to represent Arrow types, so that a type can be described >>>> using a `const char*`. Ideally, primitives types at least should be >>>> trivially parsable. We may take inspiration from Python here (`struct` >>>> module format characters, PEP 3118 format additions). >>>> >>>> Example C struct definition (not a formal proposal!): >>>> >>>> struct ArrowBuffer { >>>> void* data; >>>> int64_t nbytes; >>>> // Called by the consumer when it doesn't need the buffer anymore >>>> void (*release)(struct ArrowBuffer*); >>>> // Opaque user data (for e.g. the release callback) >>>> void* user_data; >>>> }; >>>> >>>> struct ArrowArray { >>>> // Type description >>>> const char* format; >>>> // Data description >>>> int64_t length; >>>> int64_t null_count; >>>> int64_t n_buffers; >>>> // Note: this pointers are probably owned by the ArrowArray struct >>>> // and will be released and free()ed by the release callback. >>>> struct BufferDescriptor* buffers; >>>> struct ArrowDescriptor* dictionary; >>>> // Called by the consumer when it doesn't need the array anymore >>>> void (*release)(struct ArrowArrayDescriptor*); >>>> // Opaque user data (for e.g. the release callback) >>>> void* user_data; >>>> }; >>>> >>>> Thoughts? >>>> >>>> (*) For the record, the reference for the Python buffer protocol: >>>> https://docs.python.org/3/c-api/buffer.html#buffer-structure >>>> and its C struct definition: >>>> >> https://github.com/python/cpython/blob/v3.7.4/Include/object.h#L181-L195 >>>> >>>> Regards >>>> >>>> Antoine. >>>> >>> >> >