Now at https://github.com/apache/arrow-nanoarrow

Dewey: you can use .asf.yaml to enable issues and such: 
https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubsettings

On Thu, Jul 7, 2022, at 09:06, David Li wrote:
> I'll go ahead and set up arrow-nanoarrow for convenience.
>
> In the medium term we should think about whether arrow-adbc and 
> arrow-nanoarrow should be folded back into the arrow monorepo, in order 
> to potentially reduce the release/CI maintenance burden, or document 
> why we've chosen to split those off (while other languages like Go and 
> JS remain). 
>
> On Wed, Jul 6, 2022, at 15:18, Dewey Dunnington wrote:
>> I'm happy to develop anywhere anytime! My personal vote would be
>> apache/arrow-nanoarrow because it highlights the minimal-ness of it but am
>> happy to move forward however the community sees fit.
>>
>> Cheers,
>>
>> -dewey
>>
>> On Wed, Jul 6, 2022 at 12:46 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>
>>> hi all,
>>>
>>> Is there a path to doing this development work in project-owned
>>> repositories so the IP is "blessed" from an ASF governance / IP
>>> lineage standpoint? I see two potential routes:
>>>
>>> * Working in a subdirectory of apache/arrow
>>> * Creating a new repository like apache/arrow-c (or some other
>>> arrow-$SOMETHING)
>>>
>>> Otherwise we could be looking at having to do an IP clearance /
>>> software grant at a later time.
>>>
>>> Thanks,
>>> Wes
>>>
>>> On Sat, Jun 25, 2022 at 8:52 PM Dewey Dunnington <de...@voltrondata.com>
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Thanks for all the feedback so far! I've opened up two more draft PRs
>>> > implementing [1] an API for owning buffers (precursor to creating struct
>>> > ArrowArrays) and [2] an API for creating ArrowSchema objects for all
>>> Arrow
>>> > types. All comments welcome!
>>> >
>>> > -dewey
>>> >
>>> > [1] https://github.com/paleolimbot/nanoarrow/pull/9
>>> > [2] https://github.com/paleolimbot/nanoarrow/pull/10
>>> >
>>> > On Wed, Jun 15, 2022 at 12:18 AM Dewey Dunnington <de...@voltrondata.com
>>> >
>>> > wrote:
>>> >
>>> > > Hi all,
>>> > >
>>> > > I drafted a second PR [1] drafting a design for storing parsed
>>> information
>>> > > obtained from a struct ArrowSchema (i.e., parsing the format string
>>> into
>>> > > usable C structures). There are some unsolved problems that could use a
>>> > > fresh perspective...all comments welcome!
>>> > >
>>> > > [1] https://github.com/paleolimbot/arrow-c/pull/5
>>> > >
>>> > > On Fri, Jun 10, 2022 at 12:27 PM Dewey Dunnington <
>>> de...@voltrondata.com>
>>> > > wrote:
>>> > >
>>> > >> Hi all,
>>> > >>
>>> > >> As promised, I converted the design document [1] into an initial PR
>>> [2].
>>> > >> Rather than draft the whole header, I started with README +
>>> implementations
>>> > >> + testing for error handling and schema allocation (depending on
>>> feedback,
>>> > >> next week I will draft another reviewable chunk).
>>> > >>
>>> > >> Also feel free to suggest another place to put this if one exists (the
>>> > >> choice to put it in its own repo was based on informal feedback that
>>> > >> perhaps that might be the best way to go).
>>> > >>
>>> > >> [1]
>>> > >>
>>> https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing
>>> > >> [2] https://github.com/paleolimbot/arrow-c/pull/1/files
>>> > >>
>>> > >> On Fri, Jun 3, 2022 at 12:41 PM Dewey Dunnington <
>>> de...@voltrondata.com>
>>> > >> wrote:
>>> > >>
>>> > >>> Hi all,
>>> > >>>
>>> > >>> Based on the points raised above and a few adventures implementing
>>> some
>>> > >>> of this in related projects, I put together a brief design document
>>> > >>> proposing a scope and structure to perhaps solidify a few of these
>>> > >>> discussions:
>>> > >>>
>>> https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing
>>> > >>> .
>>> > >>>
>>> > >>> Any and all should feel free to add, rewrite, or propose a new
>>> > >>> structure...I wrote many of the pieces for argument's sake or because
>>> > >>> that's how I'd implemented them before.
>>> > >>>
>>> > >>> Next week I will phrase it as a skeleton header (like the one in the
>>> > >>> excellent ADBC design discussions) depending on feedback to keep the
>>> > >>> discussion going!
>>> > >>>
>>> > >>> Cheers,
>>> > >>>
>>> > >>> -dewey
>>> > >>>
>>> > >>> On Fri, Jun 3, 2022 at 9:57 AM Hannes Mühleisen <
>>> han...@duckdblabs.com>
>>> > >>> wrote:
>>> > >>>
>>> > >>>> Hello List,
>>> > >>>>
>>> > >>>> we at DuckDB are happy users of the Arrow C Data Interface and use
>>> it to
>>> > >>>> feed SQL queries and also use it to provide query results in Arrow
>>> > >>>> format
>>> > >>>> again. It is particularly appealing to us that the interface is
>>> merely a
>>> > >>>> (C) header file that we just ship with our source code [1].
>>> Internally,
>>> > >>>> our
>>> > >>>> implementation then constructs DuckDB internal vectors from the
>>> Arrow
>>> > >>>> format [2] or vice-versa [3].
>>> > >>>>
>>> > >>>> As you can see from [2, 3] there is some complexity in getting the
>>> > >>>> conversion right, especially for more complex data types like nested
>>> > >>>> types
>>> > >>>> (list, strings). A lightweight, dependency-free library to help
>>> > >>>> constructing those would certainly be appreciated. What would also
>>> help
>>> > >>>> a
>>> > >>>> lot is validation code, Arrow structures are very delicate and one
>>> wrong
>>> > >>>> pointer can lead to disaster (which is then blamed on us), so a way
>>> to
>>> > >>>> verify the structures in said lightweight library would be very
>>> helpful.
>>> > >>>>
>>> > >>>> Best from Amsterdam, and Quack
>>> > >>>>
>>> > >>>> Hannes
>>> > >>>>
>>> > >>>> [1]
>>> > >>>>
>>> > >>>>
>>> https://github.com/duckdb/duckdb/blob/master/src/include/duckdb/common/arrow.hpp
>>> > >>>> [2]
>>> > >>>>
>>> > >>>>
>>> https://github.com/duckdb/duckdb/blob/master/src/function/table/arrow.cpp
>>> > >>>> [3]
>>> > >>>>
>>> > >>>>
>>> https://github.com/duckdb/duckdb/blob/master/src/common/types/data_chunk.cpp
>>> > >>>>
>>> > >>>>
>>> > >>>> On Fri, Jun 03, 2022 at 15:34:42, Jonathan Keane <jke...@gmail.com>
>>> > >>>> wrote:
>>> > >>>>
>>> > >>>> > cc Hannes Mühleisen from DuckDB Labs
>>> > >>>> >
>>> > >>>> > -Jon
>>> > >>>> >
>>> > >>>> >
>>> > >>>> > On Tue, May 31, 2022 at 5:03 PM Wes McKinney <wesmck...@gmail.com
>>> >
>>> > >>>> wrote:
>>> > >>>> >
>>> > >>>> > I'm also supportive of having a small vendorable C/C++ "Arrow
>>> > >>>> > middleware" that provides:
>>> > >>>> >
>>> > >>>> > * Schemas and types
>>> > >>>> > * Columnar data structures and minimal APIs to build them and
>>> iterate
>>> > >>>> over
>>> > >>>> > them
>>> > >>>> > * C data interface
>>> > >>>> > * Minimal validation (at the level of Validate but not
>>> ValidateFull)
>>> > >>>> >
>>> > >>>> > I don't think it's going to be practical to try to refactor parts
>>> of
>>> > >>>> > the existing Arrow C++ core to be vendorable since there are many
>>> > >>>> > features / requirements (e.g. an extensible buffer and device API)
>>> > >>>> > that these C++ classes include that aren't needed in this
>>> > >>>> > limited-feature middleware library.
>>> > >>>> >
>>> > >>>> > This also relates to the "Improving Arrow's database support"
>>> project
>>> > >>>> > that David Li raised some time ago [1]. If we want to encourage
>>> > >>>> > database driver libraries to add new APIs that emit the Arrow C
>>> > >>>> > interface, we need to make it easier to generate the C interface
>>> > >>>> > without requiring a new library dependency.
>>> > >>>> >
>>> > >>>> > [1]:
>>> https://lists.apache.org/thread/gnz1kz2rj3rb8rh8qz7l0mv8lvzq254w
>>> > >>>> >
>>> > >>>> > On Mon, May 30, 2022 at 11:31 AM Jonathan Keane <jke...@gmail.com
>>> >
>>> > >>>> wrote:
>>> > >>>> > >
>>> > >>>> > > Thanks for working on this. I've heard people asking about
>>> something
>>> > >>>> > > like this from a number of different fronts on top of the
>>> obvious
>>> > >>>> use
>>> > >>>> > > case in geoarrow | other geospatial libraries. I think a minimal
>>> > >>>> piece
>>> > >>>> > > of Arrow that other packages could depend on without needing to
>>> > >>>> bring
>>> > >>>> > > in all of arrow would be super valuable in building the bridges
>>> we
>>> > >>>> > > want across other systems.
>>> > >>>> > >
>>> > >>>> > > Do you have any (design) documentation that describes the scope
>>> of
>>> > >>>> > > what you're thinking? I know there have been others floating
>>> around
>>> > >>>> > > [1] [2] that were in a similar spirit.
>>> > >>>> > >
>>> > >>>> > > A few more questions I hope will spark more conversation: How
>>> do the
>>> > >>>> > > header files you linked in [3] overlap with these other
>>> efforts? Are
>>> > >>>> > > those headers something we could|should "just" PR into
>>> apache/arrow
>>> > >>>> > > and write up how to use them? If not what is the work to make
>>> them
>>> > >>>> so
>>> > >>>> > > that they could be (the answer of course could be design
>>> something
>>> > >>>> > > else entirely and PR that!)?
>>> > >>>> > >
>>> > >>>> > > [1] https://github.com/paleolimbot/narrow
>>> > >>>> > > [2]
>>> https://paleolimbot.github.io/narrow/articles/why-narrow.html
>>> > >>>> > > [3]
>>> > >>>> https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/
>>> > >>>> > internal/arrow-hpp
>>> > >>>> > >
>>> > >>>> > > -Jon
>>> > >>>> > >
>>> > >>>> > > -Jon
>>> > >>>> > >
>>> > >>>> > >
>>> > >>>> > > On Wed, May 25, 2022 at 9:29 AM Dewey Dunnington <
>>> > >>>> de...@voltrondata.com>
>>> > >>>> > wrote:
>>> > >>>> > > >
>>> > >>>> > > > I'm writing to gauge interest in a set of helpers in C and/or
>>> C++
>>> > >>>> for
>>> > >>>> > > > reading/exporting Arrow C Data interface structures. My
>>> use-case
>>> > >>>> is
>>> > >>>> > > > building Arrow geospatial support in R [1], and while the set
>>> of
>>> > >>>> > helpers
>>> > >>>> > > > I've been using [2] has served the purpose of me writing
>>> about the
>>> > >>>> > > > opportunities for Arrow + geospatial [3], I would like to
>>> rewrite
>>> > >>>> the
>>> > >>>> > > > prototype based on something developed by/with the Arrow
>>> > >>>> community.
>>> > >>>> > > >
>>> > >>>> > > > Does a set of C/C++ helpers for Arrow C Data interface
>>> structures
>>> > >>>> > already
>>> > >>>> > > > exist? *Should* it exist?
>>> > >>>> > > >
>>> > >>>> > > > If it doesn't, what should the name/scope of that library be?
>>> The
>>> > >>>> names
>>> > >>>> > > > 'nanoarrow', 'narrow', 'sparrow', and 'arrow-hpp' have all
>>> > >>>> surfaced in
>>> > >>>> > my
>>> > >>>> > > > limited discussion of this so far. For the purpose of
>>> starting the
>>> > >>>> > > > discussion, I'll posit that the library should include
>>> helpers to
>>> > >>>> > > > allocate/destroy C Data interface structures, a schema
>>> metadata
>>> > >>>> > > > encoder/decoder, validation of a schema/array pair, and
>>> something
>>> > >>>> like
>>> > >>>> > the
>>> > >>>> > > > ArrayBuilder C++ class.
>>> > >>>> > > >
>>> > >>>> > > > [1]
>>> > >>>> https://lists.apache.org/thread/yb7p9wpg3k128njskhwj9j788opb67g7
>>> > >>>> > > > [2]
>>> > >>>> > > >
>>> > >>>> https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/
>>> > >>>> > internal/arrow-hpp
>>> > >>>> > > > [3]
>>> > >>>> > > > https://docs.google.com/document/d/
>>> > >>>> > 1A6e3XCerjhXVFHBDaoAlBBNFb2HG4RB9SVRpuBru7E4/edit?usp=sharing
>>> > >>>> >
>>> > >>>> >
>>> > >>>>
>>> > >>>
>>>

Reply via email to