Re: [DISC] Improving Arrow's database support

David Li Wed, 01 Jun 2022 14:52:52 -0700

I've set up the new repo and enabled issues. I still need to get things 
building independently of Arrow, but now adbc.h is self-contained and the 
"driver manager" being prototyped can also be built and used independently of 
Arrow.


On Wed, Jun 1, 2022, at 13:55, David Li wrote:
> Wes: thanks! I'll move things over and update the list.
>
> Gavin: I mean more that ADBC won't support every little feature in 
> JDBC/ODBC, or won't necessarily make it easy to support certain things 
> (e.g. updating a single row in a ResultSet). But it's not that OLTP is 
> taboo, it's just not what is being optimized for. 
>
> For instance it would be nice to eventually have JDBC/ODBC drivers that 
> can wrap ADBC in much the same way that Dremio is working on a JDBC 
> driver for Flight SQL. But especially in the near term, ADBC just won't 
> have the feature set to make that possible.
>
> What sorts of use cases were you thinking about, though?
>
> On Wed, Jun 1, 2022, at 13:18, Gavin Ray wrote:
>> This sounds great, but I had one question:
>>
>> Read the initial ADBC proposal and it mentioned that OLTP was not a
>> targeted usecase
>> If this work is intended to take on the role of a sort of standard ABI/SDK,
>> does that mean that building OLTP-oriented drivers/tooling with it is off
>> the table?
>>
>> On Wed, Jun 1, 2022 at 11:11 AM Wes McKinney <wesmck...@gmail.com> wrote:
>>
>>> I went ahead and created
>>>
>>> https://github.com/apache/arrow-adbc
>>>
>>> I directed issue comments / PRs to issues@
>>>
>>> On Tue, May 31, 2022 at 8:49 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>> >
>>> > I think spinning up a new repository while this exploratory work
>>> > progresses is a fine idea — perhaps apache/arrow-dbc / arrow-adbc or
>>> > similar (the name can always be changed later). That would bubble up
>>> > discussions in a way that's easier for people to follow (watching your
>>> > fork isn't ideal!). If it makes sense to move code later, it can
>>> > always be moved.
>>> >
>>> >
>>> > On Tue, May 31, 2022 at 1:02 PM David Li <lidav...@apache.org> wrote:
>>> > >
>>> > > Some updates:
>>> > >
>>> > > The proposal is being updated based on feedback from contributors to
>>> DuckDB and DBI. We've been using GitHub issues on the fork to discuss the
>>> API design and how to implement data ingestion/bound parameters:
>>> https://github.com/lidavidm/arrow/issues
>>> > >
>>> > > If anyone has suggestions/ideas/questions, or would like to jump in as
>>> well, please feel free to chime in there too.
>>> > >
>>> > > I have also been wondering if we might want to plan to split off a new
>>> repo for this work? In particular, some components might be easiest to
>>> consume if they didn't also have a hard dependency on the Arrow C++
>>> libraries. And we could use the repo to manage contributed drivers (some of
>>> which may individually leverage the Arrow libraries). Of course,
>>> maintaining a parallel build system, setting up releases, etc. is also a
>>> lot of work.
>>> > >
>>> > > -David
>>> > >
>>> > > On Tue, Apr 26, 2022, at 15:01, Wes McKinney wrote:
>>> > > > I don't have major new things to add on this topic except that I've
>>> > > > long had the aspiration of creating something like Python's DBAPI 2.0
>>> > > > [1] at the C or C++ level to enable a measure of API standardization
>>> > > > for Arrow-native read/write interfaces with database drivers. It
>>> seems
>>> > > > like a natural complement to the wire-protocol standardization work
>>> > > > with FlightSQL. I had previously brought in some code that I had
>>> > > > worked on related to interfacing with the HiveServer2 wire protocol
>>> > > > (for Hive and Impala, or other HS2-compatible query engines) with the
>>> > > > intention of prototyping but never was able to find the time.
>>> > > >
>>> > > > From an external messaging standpoint, one thing that will be
>>> > > > important is to assert that this is not intended to displace or
>>> > > > deprecate ODBC or JDBC drivers. In fact, I would hope that the
>>> > > > Arrow-native APIs could be added somehow to existing driver libraries
>>> > > > where it made sense, so that if they are used in an application that
>>> > > > uses Arrow, they can opt in to using the Arrow-based APIs for getting
>>> > > > result sets, or doing bulk inserts, etc.
>>> > > >
>>> > > > [1]: https://peps.python.org/pep-0249/
>>> > > >
>>> > > > On Tue, Apr 26, 2022 at 12:36 PM Antoine Pitrou <anto...@python.org>
>>> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Do we want something more flexible than dlopen() and runtime symbol
>>> > > >> lookup (a mechanism which constrains the way you can organize and
>>> > > >> distribute drivers)?
>>> > > >>
>>> > > >> For example, perhaps we could expose an API struct of function
>>> pointers
>>> > > >> that could be obtained through driver-specific means.
>>> > > >>
>>> > > >>
>>> > > >> Le 26/04/2022 à 18:29, David Li a écrit :
>>> > > >> > Hello,
>>> > > >> >
>>> > > >> > In light of recent efforts around Flight SQL, projects like pgeon
>>> [1], and long-standing tickets/discussions about database support in Arrow
>>> [2], it seems there's an opportunity to define standard database interfaces
>>> for Arrow that could unify these efforts. So we've put together a proposal
>>> for "ADBC", a common Arrow-based database client API:
>>> > > >> >
>>> > > >> >
>>> https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/edit#heading=h.r6o6j2navi4c
>>> > > >> >
>>> > > >> > A common API and implementations could help combine/simplify
>>> client-side projects like pgeon, or what DBI is considering [3], and help
>>> them take advantage of developments like Flight SQL and existing columnar
>>> APIs.
>>> > > >> >
>>> > > >> > We'd appreciate any feedback. (Comments should be open, please
>>> let me know if not.)
>>> > > >> >
>>> > > >> > [1]: https://github.com/0x0L/pgeon
>>> > > >> > [2]: https://issues.apache.org/jira/browse/ARROW-11670
>>> > > >> > [3]: https://github.com/r-dbi/dbi3/issues/48
>>> > > >> >
>>> > > >> > Thanks,
>>> > > >> > David
>>>

Re: [DISC] Improving Arrow's database support

Reply via email to