I opened https://issues.apache.org/jira/browse/ARROW-9761 about adding
a preliminary C++ (and Python) implementation to help stir the pot. My
understanding is that DuckDB is working on using the C interface right
now [1] and the absence of an iterator interface makes such
integration require more work than would be ideal

[1]: https://github.com/cwida/duckdb/issues/151#issuecomment-674120291

On Fri, Aug 14, 2020 at 6:57 PM Jacques Nadeau <jacq...@apache.org> wrote:
>
> I think this unlocks a bunch of use cases. I think people are generally
> using Arrow in simpler, non-streaming ways right now and thus the quiet.
> Producing an iterator pattern is logical as you move to streams of smaller
> chunks (common in distributed and multi-tenant systems).
>
> On Mon, Aug 10, 2020 at 11:56 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > I'm still in need of it. I'd be interested in developing a solution
> > that can be used in some database APIs, e.g. using it for the result
> > interface for an embedded SQL database like SQLite or DuckDB would be
> > an interesting motivating use case.
> >
> > One approach would be to create something unofficial and used only in
> > the C++ library's implementation of the C API so that it can make
> > breaking changes for a time and then propose to formalize it in the
> > ABI later.
> >
> > On Mon, Aug 10, 2020 at 9:22 AM Antoine Pitrou <solip...@pitrou.net>
> > wrote:
> > >
> > >
> > > From the absence of response, it would seem there isn't much interest
> > > in this.  Please speak up if you think this would be useful to you.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > On Tue, 7 Jul 2020 07:49:17 -0500
> > > Wes McKinney <wesmck...@gmail.com> wrote:
> > > > Any opinions about this? It seems the next steps would be a concrete
> > > > API proposal and perhaps a reference implementation thereof.
> > > >
> > > > On Sun, Jun 28, 2020 at 11:26 PM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > > > >
> > > > > In ARROW-8301 [1] and elsewhere we've been discussing how to
> > > > > communicate what amounts to a sequence of arrays or a sequence of
> > > > > RecordBatch objects using the C data interface.
> > > > >
> > > > > Example use cases:
> > > > >
> > > > > * Returning a sequence of record / row batches from a database driver
> > > > > * Sending a C++ arrow::ChunkedArray or arrow::Table to a consumer
> > > > > using only the C interface
> > > > >
> > > > > Applications could define their own custom iterator interfaces to
> > > > > communicate what amounts to a sequence of the ArrowArray C interface
> > > > > objects, but it is likely a common enough use case to have an
> > > > > off-the-shelf solution so that we can support this solution in our
> > > > > reference libraries (e.g. Arrow C++, pyarrow, Arrow R)
> > > > >
> > > > > I suggested a C structure as follows
> > > > >
> > > > > struct ArrowArrayStream {
> > > > >   void (*get_schema)(struct ArrowSchema*);
> > > > >   // Non-zero return value indicates an error?
> > > > >   int (*get_next)(struct ArrowArray*);
> > > > >   void (*get_error)(... ERROR HANDLING TODO );
> > > > >   void (*release)(struct ArrowArrayStream*);
> > > > >   void* private_data;
> > > > > };
> > > > >
> > > > > The producer would populate this object with pointers to its
> > > > > implementations of these functions.
> > > > >
> > > > > Thoughts about this?
> > > > >
> > > > > Thanks,
> > > > > Wes
> > > > >
> > > > > [1]: https://issues.apache.org/jira/browse/ARROW-8301
> > > >
> > >
> > >
> > >
> >

Reply via email to