Hi,
> Do we have a preference for versioning strategy? Should we
> proceed in lockstep with the Arrow C++ library et. al. and
> release "ADBC 1.0.0" (the API standard) with "drivers
> version 10.0.0", or use an independent versioning scheme?
> (For example, release API standard and components at
> "1.0.0". Then further releases of components that do not
> change the spec would be "1.1", "1.2", ...; if/when we
> change the spec, start over with "2.0", "2.1", ...)
I like an independent versioning schema. I assume that ADBC
doesn't need backward incompatible changes frequently. How
about incrementing major version only when ADBC needs
any backward incompatible changes?
e.g.:
1. Release ADBC (the API standard) 1.0.0
2. Release adbc_driver_manager 1.0.0
3. Release adbc_driver_postgres 1.0.0
4. Add a new feature to adbc_driver_postgres without
any backward incompatible changes
5. Release adbc_driver_postgres 1.1.0
6. Fix a bug in adbc_driver_manager without
any backward incompatible changes
7. Release adbc_driver_manager 1.0.1
8. Add a backward incompatible change to adbc_driver_manager
9. Release adbc_driver_manager 2.0.0
10. Add a new feature to ADBC without any
backward incompatible changes
11. Release ADBC (the API standard) 1.1.0
Thanks,
--
kou
In <[email protected]>
"Re: [DISC] Improving Arrow's database support" on Thu, 01 Sep 2022 16:36:43
-0400,
"David Li" <[email protected]> wrote:
> Following up here with some specific questions:
>
> Matt Topol added some Go definitions [1] (thanks!) I'd assume we want to vote
> on those as well?
>
> How should the process work for Java/Go? For C/C++, I assume we'd treat it
> like the C Data Interface and copy adbc.h to format/ after a vote, and then
> vote on releases of components. Or do we really only consider the C header as
> the 'format', with the others being language-specific affordances?
>
> What about for Java and for Go? We could vote on and tag a release for Go,
> and add a documentation page that links to the Java/Go definitions at a
> specific revision (as the equivalent 'format' definition for Java/Go)? Or
> would we vendor the entire Java module/Go package as the 'format'?
>
> Do we have a preference for versioning strategy? Should we proceed in
> lockstep with the Arrow C++ library et. al. and release "ADBC 1.0.0" (the API
> standard) with "drivers version 10.0.0", or use an independent versioning
> scheme? (For example, release API standard and components at "1.0.0". Then
> further releases of components that do not change the spec would be "1.1",
> "1.2", ...; if/when we change the spec, start over with "2.0", "2.1", ...)
>
> [1]: https://github.com/apache/arrow-adbc/blob/main/go/adbc/adbc.go
>
> -David
>
> On Sun, Aug 28, 2022, at 10:56, Sutou Kouhei wrote:
>> Hi,
>>
>> OK. I'll send pull requests for GLib and Ruby soon.
>>
>>> I'm curious if you have a particular use case in mind.
>>
>> I don't have any production-ready use case yet but I want to
>> implement an Active Record adapter for ADBC. Active Record
>> is the O/R mapper for Ruby on Rails. Implementing Web
>> application by Ruby on Rails is one of major Ruby use
>> cases. So providing Active Record interface for ADBC will
>> increase Apache Arrow users in Ruby community.
>>
>> NOTE: Generally, Ruby on Rails users don't process large
>> data but they sometimes need to process large (medium?) data
>> in a batch process. Active Record adapter for ADBC may be
>> useful for such use case.
>>
>>> There's a little bit more API cleanup to do [1]. If you
>>> have comments on that or anything else, I'd appreciate
>>> them. Otherwise, pull requests would also be appreciated.
>>
>> OK. I'll open issues/pull requests when I find
>> something. For now, I think that "MODULE" type library
>> instead of "SHARED" type library in CMake terminology
>> [cmake] is better for driver modules. (I'll open an issue
>> for this later.)
>>
>> [cmake]: https://cmake.org/cmake/help/latest/command/add_library.html
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In <[email protected]>
>> "Re: [DISC] Improving Arrow's database support" on Sat, 27 Aug 2022
>> 15:28:56 -0400,
>> "David Li" <[email protected]> wrote:
>>
>>> I would be very happy to see GLib/Ruby bindings! I'm curious if you have a
>>> particular use case in mind.
>>>
>>> There's a little bit more API cleanup to do [1]. If you have comments on
>>> that or anything else, I'd appreciate them. Otherwise, pull requests would
>>> also be appreciated.
>>>
>>> [1]: https://github.com/apache/arrow-adbc/issues/79
>>>
>>> On Fri, Aug 26, 2022, at 21:53, Sutou Kouhei wrote:
>>>> Hi,
>>>>
>>>> Thanks for sharing the current status!
>>>> I understand.
>>>>
>>>> BTW, can I add GLib/Ruby bindings to apache/arrow-adbc
>>>> before we release the first version? (I want to use ADBC
>>>> from Ruby.) Or should I wait for the first release? If I can
>>>> work on it now, I'll open pull requests for it.
>>>>
>>>> Thanks,
>>>> --
>>>> kou
>>>>
>>>> In <[email protected]>
>>>> "Re: [DISC] Improving Arrow's database support" on Fri, 26 Aug 2022
>>>> 11:03:26 -0400,
>>>> "David Li" <[email protected]> wrote:
>>>>
>>>>> Thank you Kou!
>>>>>
>>>>> At least initially, I don't think I'll be able to complete the Dataset
>>>>> integration in time. So 10.0.0 probably won't ship with a hard
>>>>> dependency. That said I am hoping to have PyArrow take an optional
>>>>> dependency (so Flight SQL can finally be available from Python).
>>>>>
>>>>> On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote:
>>>>>> Hi,
>>>>>>
>>>>>> As a maintainer of Linux packages, I want apache/arrow-adbc
>>>>>> to be released before apache/arrow is released so that
>>>>>> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's
>>>>>> .deb/.rpm.
>>>>>>
>>>>>> (If Apache Arrow Dataset uses apache/arrow-adbc,
>>>>>> apache/arrow's .deb/.rpm needs to depend on
>>>>>> apache/arrow-adbc's .deb/.rpm.)
>>>>>>
>>>>>> We can add .deb/.rpm related files
>>>>>> (dev/tasks/linux-packages/ in apache/arrow) to
>>>>>> apache/arrow-adbc to build .deb/.rpm for apache/arrow-adbc.
>>>>>>
>>>>>> FYI: I did it for datafusion-contrib/datafusion-c:
>>>>>>
>>>>>> * https://github.com/datafusion-contrib/datafusion-c/tree/main/package
>>>>>> *
>>>>>> https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml
>>>>>>
>>>>>> I can work on it in apache/arrow-adbc.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> --
>>>>>> kou
>>>>>>
>>>>>> In <[email protected]>
>>>>>> "Re: [DISC] Improving Arrow's database support" on Thu, 25 Aug 2022
>>>>>> 11:51:08 -0400,
>>>>>> "David Li" <[email protected]> wrote:
>>>>>>
>>>>>>> Fair enough, thank you. I'll try to expand a bit. (Sorry for the wall
>>>>>>> of text that follows…)
>>>>>>>
>>>>>>> These are the components:
>>>>>>>
>>>>>>> - Core adbc.h header
>>>>>>> - Driver manager for C/C++
>>>>>>> - Flight SQL-based driver
>>>>>>> - Postgres-based driver (WIP)
>>>>>>> - SQLite-based driver (more of a testbed for me than an actual
>>>>>>> component - I don't think we'd actually distribute this)
>>>>>>> - Java core interfaces
>>>>>>> - Java driver manager
>>>>>>> - Java JDBC-based driver
>>>>>>> - Java Flight SQL-based driver
>>>>>>> - Python driver manager
>>>>>>>
>>>>>>> I think: adbc.h gets mirrored into the Arrow repo. The Flight SQL
>>>>>>> drivers get moved to the main Arrow repo and distributed as part of the
>>>>>>> regular Arrow releases.
>>>>>>>
>>>>>>> For the rest of the components: they could be packaged individually,
>>>>>>> but versioned and released together. Also, each C/C++ driver probably
>>>>>>> needs a corresponding Python package so Python users do not have to
>>>>>>> futz with shared library configurations. (See [1].) So for instance,
>>>>>>> installing PyArrow would also give you the Flight SQL driver, and `pip
>>>>>>> install adbc_postgres` would get you the Postgres-based driver.
>>>>>>>
>>>>>>> That would mean setting up separate CI, release, etc. (and eventually
>>>>>>> linking Crossbow & Conbench as well?). That does mean duplication of
>>>>>>> effort, but the trade off is avoiding bloating the main release process
>>>>>>> even further. However, I'd like to hear from those closer to the
>>>>>>> release process on this subject - if it would make people's lives
>>>>>>> easier, we could merge everything into one repo/process.
>>>>>>>
>>>>>>> Integrations would be distributed as part of their respective packages
>>>>>>> (e.g. Arrow Dataset would optionally link to the driver manager). So
>>>>>>> the "part of Arrow 10.0.0" aspect means having a stable interface for
>>>>>>> adbc.h, and getting the Flight SQL drivers into the main repo.
>>>>>>>
>>>>>>> [1]: https://github.com/apache/arrow-adbc/issues/53
>>>>>>>
>>>>>>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote:
>>>>>>>> On Fri, 19 Aug 2022 14:09:44 -0400
>>>>>>>> "David Li" <[email protected]> wrote:
>>>>>>>>> Since it's been a while, I'd like to give an update. There are also a
>>>>>>>>> few questions I have around distribution.
>>>>>>>>>
>>>>>>>>> Currently:
>>>>>>>>> - Supported in C, Java, and Python.
>>>>>>>>> - For C/Python, there are basic drivers wrapping Flight SQL and
>>>>>>>>> SQLite, with a draft of a libpq (Postgres) driver (using nanoarrow).
>>>>>>>>> - For Java, there are drivers wrapping JDBC and Flight SQL.
>>>>>>>>> - For Python, there's low-level bindings to the C API, and the DBAPI
>>>>>>>>> interface on top of that (+a few extension methods resembling
>>>>>>>>> DuckDB/Turbodbc).
>>>>>>>>>
>>>>>>>>> There's drafts of integration with Ibis [1], DBI (R), and DuckDB.
>>>>>>>>> (I'd like to thank Hannes and Kirill for their comments, as well as
>>>>>>>>> Antoine, Dewey, and Matt here.)
>>>>>>>>>
>>>>>>>>> I'd like to have this as part of 10.0.0 in some fashion. However, I'm
>>>>>>>>> not sure how we would like to handle packaging and distribution. In
>>>>>>>>> particular, there are several sub-components for each language (the
>>>>>>>>> driver manager + the drivers), increasing the work. Any thoughts here?
>>>>>>>>
>>>>>>>> Sorry, forgot to answer here. But I think your question is too broadly
>>>>>>>> formulated. It probably deserves a case-by-case discussion, IMHO.
>>>>>>>>
>>>>>>>>> I'm also wondering how we want to handle this in terms of
>>>>>>>>> specification - I assume we'd consider the core header file/Java
>>>>>>>>> interfaces a spec like the C Data Interface/Flight RPC, and vote on
>>>>>>>>> them/mirror them into the format/ directory?
>>>>>>>>
>>>>>>>> That sounds like the right way to me indeed.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Antoine.