Hi Arrow developers!

This is my first time posting on this mailing list, so please let me know if 
this post belongs elsewhere.

I and a few colleagues plan to implement a C++ interface for building read-only 
queries and executing those against Arrow Datasets through Substrait consumers 
like Acero, DuckDB, and Velox. Since we hope to build this out in the open, I 
have outlined the kind of interface that we intend to build in this Google doc 
[0].

I'm making this post for a few reasons:

    - To gauge whether the community feels like this work would be worth 
pursuing as an open-source project
    - To receive feedback on the proposed interface and ensure that we would be 
able to accommodate a wide variety of use-cases (please feel free to leave 
comments directly on the doc)
    - To connect with developers who might be interested in collaborating on 
this effort

Relatedly, I would like to get the Arrow developers' thoughts on whether it 
would make sense to pursue this work as an official Arrow project (e.g. in an 
experimental repo) or if it would be better as a standalone project. I 
understand that pursuing this as an Arrow project would have its downsides 
(like increased review/maintenance burden) and risks confusing new users as to 
what the official Arrow libraries aim to solve [1]. On the other hand, making 
such an interface readily available alongside `libarrow` could increase the 
adoption of Arrow among certain developers (e.g. in finance/fintech). 
Regardless of your opinion, I'd love to hear your thoughts on which approach 
makes more sense.

Please feel free to reply here on the mailing list or leave comments on the 
linked Google Doc!

[0]: 
https://docs.google.com/document/d/1_ktKxtOFW1grD-VcbBNc0FaP4g5j7vSx9bO2ht59JFA
[1]: 
https://www.datawill.io/posts/apache-arrow-2022-reflection/#who-is-libarrows-and-aceros-audience

Reply via email to