Thank you for starting this discussion, Andrew! Fiona, Sreehari, and I thought a bit about this, and I've summarized some of our thoughts below.
Continue: 1. +1 to Will's suggestion about roadmaps for sub-projects. This is something that would be helpful for the MATLAB interface, for example. We would also be interested in the possibility of exploring a MATLAB sync call if it would be of interest to other community members. 2. Continue focusing on building an inclusive developer community. Finish the work required to rename the master branch to main. Consider running automated checks on pull requests using a tool like alex [1] to prevent use of inappropriate language and terminology. Start: 1. Add more visuals and diagrams to the documentation. It can be pretty overwhelming for new community members to look at the in-depth Arrow C++ documentation and be able to quickly get a high-level understanding of how the various data structures (e.g. buffer, array, chunked array, record batch, table, field, schema, data type, etc.) relate to one another. Having more visuals with clear labels that show the relationship between these key concepts would be very helpful. This also applies to other parts of the documentation, like the CI systems (e.g. crossbow), which have a lot of moving parts. 2. Use pull request templates. This would hopefully make it easier for both new and existing contributors to describe their changes in a focused and clear way to others. For example, when making pull requests related to the MATLAB interface, we've been trying to follow a fairly consistent pattern for pull request descriptions which includes sections like "Overview", "Implementation", "Testing", "Future Directions", "Notes", etc. Stop: 1. +1 to Andrew's point about the reliance on a small number of core contributors for code reviews. Documenting a process for determining who should be included on a code review would be helpful. [1] https://github.com/get-alex/alex ________________________________ From: Dewey Dunnington <de...@voltrondata.com.INVALID> Sent: Tuesday, January 3, 2023 2:33 PM To: dev@arrow.apache.org <dev@arrow.apache.org> Subject: Re: [DISCUSS] State of the Arrow Project 2022 First, a +1000 on Will's blog post! [1] Continue: Building tools that benefit users of all languages, with particular kudos to ADBC for providing an ABI-stable way to write database drivers that can be used by practitioners in C++, Ruby, Python, Java, Go, and (soon!) R. Start: I wonder if this is the year that we can find a way to write compute functions in such a way that separate implementations don't have to exist for C++, Go, and Rust (and maybe others I don't know about). Stop: Will's comment that we should stop building data scientist-facing tools under the Arrow name struck a particular chord with me...the R package is very much data scientist facing and we have a rather large disjoint between the technical capacity of our users and the technical capacity required to contribute to the package (e.g., maintaining a development Arrow C++ install). The types of things we have to do to make RecordBatchReader, Arrays, Buffer, RecordBatch and Table structures available to R users and the types of things we have to do to provide an Acero dplyr backend are vastly different. [1] https://www.datawill.io/posts/apache-arrow-2022-reflection/<https://www.datawill.io/posts/apache-arrow-2022-reflection> On Thu, Dec 29, 2022 at 4:09 PM Jacob Wujciak <ja...@voltrondata.com.invalid> wrote: > This is a great idea, I will add some thoughts later but just wanted to > quickly add that the Zulip Chat [1] was recently switched to allow anyone > to register without the need for an invite link! > [1]: https://ursalabs.zulipchat.com/<https://ursalabs.zulipchat.com> > > > On Wed, Dec 28, 2022 at 11:27 PM Will Jones <will.jones...@gmail.com> > wrote: > > > Thanks for suggesting this Andrew. > > > > I just uploaded a blog post with my thoughts in long form [1]. Here are > > some suggestions pulled from that: > > > > Continue: > > > > I hope we will continue prioritizing updating the spec for new array > > formats. [2] I think this is very important for avoiding fragmentation > and > > may even open opportunities for consolidation in the C++ ecosystem. > > > > +1 on additional improvements for documentation, examples, no-invite > chats. > > I am particularly keen on seeing evangelism for our protocols; existing > > ones like C Data Interface aren't nearly as widely known as they ought to > > be and I'm excited for new ones like ADBC. > > > > Start: > > > > Find ways for each subproject to publicly develop a clear roadmap. > > Otherwise by default these discussions happen in private, either between > > individual ICs or within corporate environments. Some subprojects, such > as > > Acero could likely use their own sync call to help facilitate this, even > if > > on a slower cadence than the main biweekly call. > > > > Also, other sync calls might consider adapting to the sync call note > style > > used in the Rust projects, where all notes are in one google doc [3] > rather > > than spread across main mailing list threads. That seems like a format > that > > would make it easy for new contributors to catch up on the major focuses > of > > the project. > > > > Stop: > > > > Don't create end-user (e.g. data scientist) facing tools under the name > > Arrow; prefer keeping separate brand identities for those tools and > keeping > > arrow libraries as developer-facing libraries. > > > > [1] > > https://www.datawill.io/posts/apache-arrow-2022-reflection/<https://www.datawill.io/posts/apache-arrow-2022-reflection/> > > [2] > > https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq<https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq> > > [3] > > > > > https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#heading=h.qkuvi08gk4qa<https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#heading=h.qkuvi08gk4qa> > > > > On Mon, Dec 26, 2022 at 10:12 AM Andrew Lamb <al...@influxdata.com> > wrote: > > > > > Hi all, > > > > > > I am very excited and honored to help steer the Arrow Project this year > > as > > > Arrow PMC Chair. > > > > > > Something Kou suggested, and the PMC thought would be valuable, is to > > have > > > a small retrospective about the state of the project and where we want > to > > > take it. I would like to try doing so via a “state of the project” > type > > > discussion on this mailing list, inspired by an example from Apache > > Calcite > > > [1]. > > > > > > I welcome any / all comments on the following topics: What things / > > > activities, if any, do you you think the Apache Arrow Community should: > > > > > > 1. Continue > > > 2. Start > > > 3. Stop > > > > > > My thoughts are below. > > > > > > Andrew > > > > > > [1] > > > https://lists.apache.org/thread/tx8gw3vxc4kwfzjs6q2gqwgywnsm1zbf<https://lists.apache.org/thread/tx8gw3vxc4kwfzjs6q2gqwgywnsm1zbf> > > > > > > Continue: > > > > > > I hope we can continue to encourage and support community growth, > focused > > > especially on supporting the sub projects and their leadership. I also > > > would like to continue and grow the outward facing evangelism about the > > > project with blog posts and presentations. > > > > > > Start: > > > > > > Lower the barrier to contributors and accepting those contributions > even > > > more, especially for casual contributors. The move to github issues > from > > > JIRA I see as one example of lowering this barrier (by reducing the > > > required account maintenance). I would love to see additional > > improvements > > > in areas like documentation, examples, no-invite-needed chat, etc. > > > > > > Stop: > > > > > > It would be nice to stop (reduce) the reliance on the relatively small > > > number of core contributors for code review. I don’t have any > particular > > > insight on how to accomplish this, and suspect we will always have less > > > review capacity than we would like, but it would be nice to encourage > the > > > growth. > > > > > >