Now at https://github.com/apache/arrow-nanoarrow
Dewey: you can use .asf.yaml to enable issues and such: https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubsettings On Thu, Jul 7, 2022, at 09:06, David Li wrote: > I'll go ahead and set up arrow-nanoarrow for convenience. > > In the medium term we should think about whether arrow-adbc and > arrow-nanoarrow should be folded back into the arrow monorepo, in order > to potentially reduce the release/CI maintenance burden, or document > why we've chosen to split those off (while other languages like Go and > JS remain). > > On Wed, Jul 6, 2022, at 15:18, Dewey Dunnington wrote: >> I'm happy to develop anywhere anytime! My personal vote would be >> apache/arrow-nanoarrow because it highlights the minimal-ness of it but am >> happy to move forward however the community sees fit. >> >> Cheers, >> >> -dewey >> >> On Wed, Jul 6, 2022 at 12:46 PM Wes McKinney <wesmck...@gmail.com> wrote: >> >>> hi all, >>> >>> Is there a path to doing this development work in project-owned >>> repositories so the IP is "blessed" from an ASF governance / IP >>> lineage standpoint? I see two potential routes: >>> >>> * Working in a subdirectory of apache/arrow >>> * Creating a new repository like apache/arrow-c (or some other >>> arrow-$SOMETHING) >>> >>> Otherwise we could be looking at having to do an IP clearance / >>> software grant at a later time. >>> >>> Thanks, >>> Wes >>> >>> On Sat, Jun 25, 2022 at 8:52 PM Dewey Dunnington <de...@voltrondata.com> >>> wrote: >>> > >>> > Hi all, >>> > >>> > Thanks for all the feedback so far! I've opened up two more draft PRs >>> > implementing [1] an API for owning buffers (precursor to creating struct >>> > ArrowArrays) and [2] an API for creating ArrowSchema objects for all >>> Arrow >>> > types. All comments welcome! >>> > >>> > -dewey >>> > >>> > [1] https://github.com/paleolimbot/nanoarrow/pull/9 >>> > [2] https://github.com/paleolimbot/nanoarrow/pull/10 >>> > >>> > On Wed, Jun 15, 2022 at 12:18 AM Dewey Dunnington <de...@voltrondata.com >>> > >>> > wrote: >>> > >>> > > Hi all, >>> > > >>> > > I drafted a second PR [1] drafting a design for storing parsed >>> information >>> > > obtained from a struct ArrowSchema (i.e., parsing the format string >>> into >>> > > usable C structures). There are some unsolved problems that could use a >>> > > fresh perspective...all comments welcome! >>> > > >>> > > [1] https://github.com/paleolimbot/arrow-c/pull/5 >>> > > >>> > > On Fri, Jun 10, 2022 at 12:27 PM Dewey Dunnington < >>> de...@voltrondata.com> >>> > > wrote: >>> > > >>> > >> Hi all, >>> > >> >>> > >> As promised, I converted the design document [1] into an initial PR >>> [2]. >>> > >> Rather than draft the whole header, I started with README + >>> implementations >>> > >> + testing for error handling and schema allocation (depending on >>> feedback, >>> > >> next week I will draft another reviewable chunk). >>> > >> >>> > >> Also feel free to suggest another place to put this if one exists (the >>> > >> choice to put it in its own repo was based on informal feedback that >>> > >> perhaps that might be the best way to go). >>> > >> >>> > >> [1] >>> > >> >>> https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing >>> > >> [2] https://github.com/paleolimbot/arrow-c/pull/1/files >>> > >> >>> > >> On Fri, Jun 3, 2022 at 12:41 PM Dewey Dunnington < >>> de...@voltrondata.com> >>> > >> wrote: >>> > >> >>> > >>> Hi all, >>> > >>> >>> > >>> Based on the points raised above and a few adventures implementing >>> some >>> > >>> of this in related projects, I put together a brief design document >>> > >>> proposing a scope and structure to perhaps solidify a few of these >>> > >>> discussions: >>> > >>> >>> https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing >>> > >>> . >>> > >>> >>> > >>> Any and all should feel free to add, rewrite, or propose a new >>> > >>> structure...I wrote many of the pieces for argument's sake or because >>> > >>> that's how I'd implemented them before. >>> > >>> >>> > >>> Next week I will phrase it as a skeleton header (like the one in the >>> > >>> excellent ADBC design discussions) depending on feedback to keep the >>> > >>> discussion going! >>> > >>> >>> > >>> Cheers, >>> > >>> >>> > >>> -dewey >>> > >>> >>> > >>> On Fri, Jun 3, 2022 at 9:57 AM Hannes Mühleisen < >>> han...@duckdblabs.com> >>> > >>> wrote: >>> > >>> >>> > >>>> Hello List, >>> > >>>> >>> > >>>> we at DuckDB are happy users of the Arrow C Data Interface and use >>> it to >>> > >>>> feed SQL queries and also use it to provide query results in Arrow >>> > >>>> format >>> > >>>> again. It is particularly appealing to us that the interface is >>> merely a >>> > >>>> (C) header file that we just ship with our source code [1]. >>> Internally, >>> > >>>> our >>> > >>>> implementation then constructs DuckDB internal vectors from the >>> Arrow >>> > >>>> format [2] or vice-versa [3]. >>> > >>>> >>> > >>>> As you can see from [2, 3] there is some complexity in getting the >>> > >>>> conversion right, especially for more complex data types like nested >>> > >>>> types >>> > >>>> (list, strings). A lightweight, dependency-free library to help >>> > >>>> constructing those would certainly be appreciated. What would also >>> help >>> > >>>> a >>> > >>>> lot is validation code, Arrow structures are very delicate and one >>> wrong >>> > >>>> pointer can lead to disaster (which is then blamed on us), so a way >>> to >>> > >>>> verify the structures in said lightweight library would be very >>> helpful. >>> > >>>> >>> > >>>> Best from Amsterdam, and Quack >>> > >>>> >>> > >>>> Hannes >>> > >>>> >>> > >>>> [1] >>> > >>>> >>> > >>>> >>> https://github.com/duckdb/duckdb/blob/master/src/include/duckdb/common/arrow.hpp >>> > >>>> [2] >>> > >>>> >>> > >>>> >>> https://github.com/duckdb/duckdb/blob/master/src/function/table/arrow.cpp >>> > >>>> [3] >>> > >>>> >>> > >>>> >>> https://github.com/duckdb/duckdb/blob/master/src/common/types/data_chunk.cpp >>> > >>>> >>> > >>>> >>> > >>>> On Fri, Jun 03, 2022 at 15:34:42, Jonathan Keane <jke...@gmail.com> >>> > >>>> wrote: >>> > >>>> >>> > >>>> > cc Hannes Mühleisen from DuckDB Labs >>> > >>>> > >>> > >>>> > -Jon >>> > >>>> > >>> > >>>> > >>> > >>>> > On Tue, May 31, 2022 at 5:03 PM Wes McKinney <wesmck...@gmail.com >>> > >>> > >>>> wrote: >>> > >>>> > >>> > >>>> > I'm also supportive of having a small vendorable C/C++ "Arrow >>> > >>>> > middleware" that provides: >>> > >>>> > >>> > >>>> > * Schemas and types >>> > >>>> > * Columnar data structures and minimal APIs to build them and >>> iterate >>> > >>>> over >>> > >>>> > them >>> > >>>> > * C data interface >>> > >>>> > * Minimal validation (at the level of Validate but not >>> ValidateFull) >>> > >>>> > >>> > >>>> > I don't think it's going to be practical to try to refactor parts >>> of >>> > >>>> > the existing Arrow C++ core to be vendorable since there are many >>> > >>>> > features / requirements (e.g. an extensible buffer and device API) >>> > >>>> > that these C++ classes include that aren't needed in this >>> > >>>> > limited-feature middleware library. >>> > >>>> > >>> > >>>> > This also relates to the "Improving Arrow's database support" >>> project >>> > >>>> > that David Li raised some time ago [1]. If we want to encourage >>> > >>>> > database driver libraries to add new APIs that emit the Arrow C >>> > >>>> > interface, we need to make it easier to generate the C interface >>> > >>>> > without requiring a new library dependency. >>> > >>>> > >>> > >>>> > [1]: >>> https://lists.apache.org/thread/gnz1kz2rj3rb8rh8qz7l0mv8lvzq254w >>> > >>>> > >>> > >>>> > On Mon, May 30, 2022 at 11:31 AM Jonathan Keane <jke...@gmail.com >>> > >>> > >>>> wrote: >>> > >>>> > > >>> > >>>> > > Thanks for working on this. I've heard people asking about >>> something >>> > >>>> > > like this from a number of different fronts on top of the >>> obvious >>> > >>>> use >>> > >>>> > > case in geoarrow | other geospatial libraries. I think a minimal >>> > >>>> piece >>> > >>>> > > of Arrow that other packages could depend on without needing to >>> > >>>> bring >>> > >>>> > > in all of arrow would be super valuable in building the bridges >>> we >>> > >>>> > > want across other systems. >>> > >>>> > > >>> > >>>> > > Do you have any (design) documentation that describes the scope >>> of >>> > >>>> > > what you're thinking? I know there have been others floating >>> around >>> > >>>> > > [1] [2] that were in a similar spirit. >>> > >>>> > > >>> > >>>> > > A few more questions I hope will spark more conversation: How >>> do the >>> > >>>> > > header files you linked in [3] overlap with these other >>> efforts? Are >>> > >>>> > > those headers something we could|should "just" PR into >>> apache/arrow >>> > >>>> > > and write up how to use them? If not what is the work to make >>> them >>> > >>>> so >>> > >>>> > > that they could be (the answer of course could be design >>> something >>> > >>>> > > else entirely and PR that!)? >>> > >>>> > > >>> > >>>> > > [1] https://github.com/paleolimbot/narrow >>> > >>>> > > [2] >>> https://paleolimbot.github.io/narrow/articles/why-narrow.html >>> > >>>> > > [3] >>> > >>>> https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/ >>> > >>>> > internal/arrow-hpp >>> > >>>> > > >>> > >>>> > > -Jon >>> > >>>> > > >>> > >>>> > > -Jon >>> > >>>> > > >>> > >>>> > > >>> > >>>> > > On Wed, May 25, 2022 at 9:29 AM Dewey Dunnington < >>> > >>>> de...@voltrondata.com> >>> > >>>> > wrote: >>> > >>>> > > > >>> > >>>> > > > I'm writing to gauge interest in a set of helpers in C and/or >>> C++ >>> > >>>> for >>> > >>>> > > > reading/exporting Arrow C Data interface structures. My >>> use-case >>> > >>>> is >>> > >>>> > > > building Arrow geospatial support in R [1], and while the set >>> of >>> > >>>> > helpers >>> > >>>> > > > I've been using [2] has served the purpose of me writing >>> about the >>> > >>>> > > > opportunities for Arrow + geospatial [3], I would like to >>> rewrite >>> > >>>> the >>> > >>>> > > > prototype based on something developed by/with the Arrow >>> > >>>> community. >>> > >>>> > > > >>> > >>>> > > > Does a set of C/C++ helpers for Arrow C Data interface >>> structures >>> > >>>> > already >>> > >>>> > > > exist? *Should* it exist? >>> > >>>> > > > >>> > >>>> > > > If it doesn't, what should the name/scope of that library be? >>> The >>> > >>>> names >>> > >>>> > > > 'nanoarrow', 'narrow', 'sparrow', and 'arrow-hpp' have all >>> > >>>> surfaced in >>> > >>>> > my >>> > >>>> > > > limited discussion of this so far. For the purpose of >>> starting the >>> > >>>> > > > discussion, I'll posit that the library should include >>> helpers to >>> > >>>> > > > allocate/destroy C Data interface structures, a schema >>> metadata >>> > >>>> > > > encoder/decoder, validation of a schema/array pair, and >>> something >>> > >>>> like >>> > >>>> > the >>> > >>>> > > > ArrayBuilder C++ class. >>> > >>>> > > > >>> > >>>> > > > [1] >>> > >>>> https://lists.apache.org/thread/yb7p9wpg3k128njskhwj9j788opb67g7 >>> > >>>> > > > [2] >>> > >>>> > > > >>> > >>>> https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/ >>> > >>>> > internal/arrow-hpp >>> > >>>> > > > [3] >>> > >>>> > > > https://docs.google.com/document/d/ >>> > >>>> > 1A6e3XCerjhXVFHBDaoAlBBNFb2HG4RB9SVRpuBru7E4/edit?usp=sharing >>> > >>>> > >>> > >>>> > >>> > >>>> >>> > >>> >>>