+1 to what Ian said. I'll also add, as was brought up in the earlier pandas API discussion, that a blog post (once you've got something ready to use) would be a good idea.
> On the other hand, making such an interface readily available alongside > `libarrow` could increase the adoption of Arrow among certain developers > (e.g. in finance/fintech). I don't fully understand this statement. I suspect you are correct but, since I don't understand the reasons, I can't really say what we can do to help. For example, if it's purely the fact that it is an ASF project then we can make a new arrow-xyz repo (you'd need a committer to dedicate time to reviews and you would need to convince several PMC members to commit time for validating your releases). On Fri, Jan 27, 2023 at 9:10 AM Ian Cook <i...@ursacomputing.com> wrote: > > Hi Shoumyo, > > This is exciting—thank you for the thoughtfulness you have put into > this proposal. > > This topic of a C++ dataframe API for Arrow-native engine(s) has come > up in the past [3], but the bulk of the previous discussion about this > predated Substrait. With the Substrait project now quickly gaining > momentum, it seems an excellent time to revisit this topic and to > incorporate Substrait into it, as you have done. > > I strongly believe that this work should happen in a repository that > is outside of the Arrow project. Many of the exciting developments in > Arrow-land these days are happening in the broader ecosystem around > Arrow. The proposed API could be used independently of Arrow libraries > (for example, it could be used with DuckDB). For projects like this, I > think our hope as Arrow maintainers is to "let a hundred flowers > bloom" around Arrow (all with excellent operability based on Arrow > standards) rather than centralizing the work inside Arrow > repositories. We can use resources including the "Powered by Arrow" > and "Powered by Substrait" pages to highlight the project. > > Thank you, > Ian > > [3] > https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/ > [4] https://arrow.apache.org/powered_by/ > [5] https://substrait.io/community/powered_by/ > > On Wed, Jan 25, 2023 at 1:02 PM Shoumyo Chakravorti (BLOOMBERG/ 731 > LEX) <schakravo...@bloomberg.net> wrote: > > > > Hi Arrow developers! > > > > This is my first time posting on this mailing list, so please let me know > > if this post belongs elsewhere. > > > > I and a few colleagues plan to implement a C++ interface for building > > read-only queries and executing those against Arrow Datasets through > > Substrait consumers like Acero, DuckDB, and Velox. Since we hope to build > > this out in the open, I have outlined the kind of interface that we intend > > to build in this Google doc [0]. > > > > I'm making this post for a few reasons: > > > > - To gauge whether the community feels like this work would be worth > > pursuing as an open-source project > > - To receive feedback on the proposed interface and ensure that we > > would be able to accommodate a wide variety of use-cases (please feel free > > to leave comments directly on the doc) > > - To connect with developers who might be interested in collaborating > > on this effort > > > > Relatedly, I would like to get the Arrow developers' thoughts on whether it > > would make sense to pursue this work as an official Arrow project (e.g. in > > an experimental repo) or if it would be better as a standalone project. I > > understand that pursuing this as an Arrow project would have its downsides > > (like increased review/maintenance burden) and risks confusing new users as > > to what the official Arrow libraries aim to solve [1]. On the other hand, > > making such an interface readily available alongside `libarrow` could > > increase the adoption of Arrow among certain developers (e.g. in > > finance/fintech). Regardless of your opinion, I'd love to hear your > > thoughts on which approach makes more sense. > > > > Please feel free to reply here on the mailing list or leave comments on the > > linked Google Doc! > > > > [0]: > > https://docs.google.com/document/d/1_ktKxtOFW1grD-VcbBNc0FaP4g5j7vSx9bO2ht59JFA > > [1]: > > https://www.datawill.io/posts/apache-arrow-2022-reflection/#who-is-libarrows-and-aceros-audience