+1 to what Ian said.  I'll also add, as was brought up in the earlier
pandas API discussion, that a blog post (once you've got something
ready to use) would be a good idea.

> On the other hand, making such an interface readily available alongside 
> `libarrow` could increase the adoption of Arrow among certain developers 
> (e.g. in finance/fintech).

I don't fully understand this statement.  I suspect you are correct
but, since I don't understand the reasons, I can't really say what we
can do to help.  For example, if it's purely the fact that it is an
ASF project then we can make a new arrow-xyz repo (you'd need a
committer to dedicate time to reviews and you would need to convince
several PMC members to commit time for validating your releases).

On Fri, Jan 27, 2023 at 9:10 AM Ian Cook <i...@ursacomputing.com> wrote:
>
> Hi Shoumyo,
>
> This is exciting—thank you for the thoughtfulness you have put into
> this proposal.
>
> This topic of a C++ dataframe API for Arrow-native engine(s) has come
> up in the past [3], but the bulk of the previous discussion about this
> predated Substrait. With the Substrait project now quickly gaining
> momentum, it seems an excellent time to revisit this topic and to
> incorporate Substrait into it, as you have done.
>
> I strongly believe that this work should happen in a repository that
> is outside of the Arrow project. Many of the exciting developments in
> Arrow-land these days are happening in the broader ecosystem around
> Arrow. The proposed API could be used independently of Arrow libraries
> (for example, it could be used with DuckDB). For projects like this, I
> think our hope as Arrow maintainers is to "let a hundred flowers
> bloom" around Arrow (all with excellent operability based on Arrow
> standards) rather than centralizing the work inside Arrow
> repositories. We can use resources including the "Powered by Arrow"
> and "Powered by Substrait" pages to highlight the project.
>
> Thank you,
> Ian
>
> [3] 
> https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/
> [4] https://arrow.apache.org/powered_by/
> [5] https://substrait.io/community/powered_by/
>
> On Wed, Jan 25, 2023 at 1:02 PM Shoumyo Chakravorti (BLOOMBERG/ 731
> LEX) <schakravo...@bloomberg.net> wrote:
> >
> > Hi Arrow developers!
> >
> > This is my first time posting on this mailing list, so please let me know 
> > if this post belongs elsewhere.
> >
> > I and a few colleagues plan to implement a C++ interface for building 
> > read-only queries and executing those against Arrow Datasets through 
> > Substrait consumers like Acero, DuckDB, and Velox. Since we hope to build 
> > this out in the open, I have outlined the kind of interface that we intend 
> > to build in this Google doc [0].
> >
> > I'm making this post for a few reasons:
> >
> >     - To gauge whether the community feels like this work would be worth 
> > pursuing as an open-source project
> >     - To receive feedback on the proposed interface and ensure that we 
> > would be able to accommodate a wide variety of use-cases (please feel free 
> > to leave comments directly on the doc)
> >     - To connect with developers who might be interested in collaborating 
> > on this effort
> >
> > Relatedly, I would like to get the Arrow developers' thoughts on whether it 
> > would make sense to pursue this work as an official Arrow project (e.g. in 
> > an experimental repo) or if it would be better as a standalone project. I 
> > understand that pursuing this as an Arrow project would have its downsides 
> > (like increased review/maintenance burden) and risks confusing new users as 
> > to what the official Arrow libraries aim to solve [1]. On the other hand, 
> > making such an interface readily available alongside `libarrow` could 
> > increase the adoption of Arrow among certain developers (e.g. in 
> > finance/fintech). Regardless of your opinion, I'd love to hear your 
> > thoughts on which approach makes more sense.
> >
> > Please feel free to reply here on the mailing list or leave comments on the 
> > linked Google Doc!
> >
> > [0]: 
> > https://docs.google.com/document/d/1_ktKxtOFW1grD-VcbBNc0FaP4g5j7vSx9bO2ht59JFA
> > [1]: 
> > https://www.datawill.io/posts/apache-arrow-2022-reflection/#who-is-libarrows-and-aceros-audience

Reply via email to