First, a +1000 on Will's blog post! [1] Continue:
Building tools that benefit users of all languages, with particular kudos to ADBC for providing an ABI-stable way to write database drivers that can be used by practitioners in C++, Ruby, Python, Java, Go, and (soon!) R. Start: I wonder if this is the year that we can find a way to write compute functions in such a way that separate implementations don't have to exist for C++, Go, and Rust (and maybe others I don't know about). Stop: Will's comment that we should stop building data scientist-facing tools under the Arrow name struck a particular chord with me...the R package is very much data scientist facing and we have a rather large disjoint between the technical capacity of our users and the technical capacity required to contribute to the package (e.g., maintaining a development Arrow C++ install). The types of things we have to do to make RecordBatchReader, Arrays, Buffer, RecordBatch and Table structures available to R users and the types of things we have to do to provide an Acero dplyr backend are vastly different. [1] https://www.datawill.io/posts/apache-arrow-2022-reflection/ On Thu, Dec 29, 2022 at 4:09 PM Jacob Wujciak <ja...@voltrondata.com.invalid> wrote: > This is a great idea, I will add some thoughts later but just wanted to > quickly add that the Zulip Chat [1] was recently switched to allow anyone > to register without the need for an invite link! > [1]: https://ursalabs.zulipchat.com/ > > > On Wed, Dec 28, 2022 at 11:27 PM Will Jones <will.jones...@gmail.com> > wrote: > > > Thanks for suggesting this Andrew. > > > > I just uploaded a blog post with my thoughts in long form [1]. Here are > > some suggestions pulled from that: > > > > Continue: > > > > I hope we will continue prioritizing updating the spec for new array > > formats. [2] I think this is very important for avoiding fragmentation > and > > may even open opportunities for consolidation in the C++ ecosystem. > > > > +1 on additional improvements for documentation, examples, no-invite > chats. > > I am particularly keen on seeing evangelism for our protocols; existing > > ones like C Data Interface aren't nearly as widely known as they ought to > > be and I'm excited for new ones like ADBC. > > > > Start: > > > > Find ways for each subproject to publicly develop a clear roadmap. > > Otherwise by default these discussions happen in private, either between > > individual ICs or within corporate environments. Some subprojects, such > as > > Acero could likely use their own sync call to help facilitate this, even > if > > on a slower cadence than the main biweekly call. > > > > Also, other sync calls might consider adapting to the sync call note > style > > used in the Rust projects, where all notes are in one google doc [3] > rather > > than spread across main mailing list threads. That seems like a format > that > > would make it easy for new contributors to catch up on the major focuses > of > > the project. > > > > Stop: > > > > Don't create end-user (e.g. data scientist) facing tools under the name > > Arrow; prefer keeping separate brand identities for those tools and > keeping > > arrow libraries as developer-facing libraries. > > > > [1] https://www.datawill.io/posts/apache-arrow-2022-reflection/ > > [2] https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq > > [3] > > > > > https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#heading=h.qkuvi08gk4qa > > > > On Mon, Dec 26, 2022 at 10:12 AM Andrew Lamb <al...@influxdata.com> > wrote: > > > > > Hi all, > > > > > > I am very excited and honored to help steer the Arrow Project this year > > as > > > Arrow PMC Chair. > > > > > > Something Kou suggested, and the PMC thought would be valuable, is to > > have > > > a small retrospective about the state of the project and where we want > to > > > take it. I would like to try doing so via a “state of the project” > type > > > discussion on this mailing list, inspired by an example from Apache > > Calcite > > > [1]. > > > > > > I welcome any / all comments on the following topics: What things / > > > activities, if any, do you you think the Apache Arrow Community should: > > > > > > 1. Continue > > > 2. Start > > > 3. Stop > > > > > > My thoughts are below. > > > > > > Andrew > > > > > > [1] https://lists.apache.org/thread/tx8gw3vxc4kwfzjs6q2gqwgywnsm1zbf > > > > > > Continue: > > > > > > I hope we can continue to encourage and support community growth, > focused > > > especially on supporting the sub projects and their leadership. I also > > > would like to continue and grow the outward facing evangelism about the > > > project with blog posts and presentations. > > > > > > Start: > > > > > > Lower the barrier to contributors and accepting those contributions > even > > > more, especially for casual contributors. The move to github issues > from > > > JIRA I see as one example of lowering this barrier (by reducing the > > > required account maintenance). I would love to see additional > > improvements > > > in areas like documentation, examples, no-invite-needed chat, etc. > > > > > > Stop: > > > > > > It would be nice to stop (reduce) the reliance on the relatively small > > > number of core contributors for code review. I don’t have any > particular > > > insight on how to accomplish this, and suspect we will always have less > > > review capacity than we would like, but it would be nice to encourage > the > > > growth. > > > > > >