We have used pull request templates in the various rust projects to good effect: most PRs clearly describe what they are doing and why.
For your reference, they are at arrow-rs[1] and arrow-datafusion[2]. [1] https://raw.githubusercontent.com/apache/arrow-rs/master/.github/pull_request_template.md [2] https://raw.githubusercontent.com/apache/arrow-datafusion/master/.github/pull_request_template.md On Fri, Jan 6, 2023 at 11:18 PM Will Jones <will.jones...@gmail.com> wrote: > Thanks, Kevin. > > Documenting a process for determining who should be included on a code > > review would be helpful. > > > > That's a good idea. We have a docs page directed at contributors, but I'm > not sure how many people have read it [1]. This would be a good addition to > it. (There's also a good guide on reviewing contributions [2].) I also like > the idea of pull request templates, and it seems like if we provide a link > in the template to this overview, more of our contributors would read the > guide. I have created an issue for this [3]. > > Also +1 on more diagrams. I've created a couple recently (for example [4]) > and hope to make more. > > [1] https://arrow.apache.org/docs/developers/overview.html > [2] https://arrow.apache.org/docs/developers/reviewing.html > [3] https://github.com/apache/arrow/issues/15232 > [4] https://arrow.apache.org/docs/format/Glossary.html#term-table > > On Fri, Jan 6, 2023 at 12:26 PM Kevin Gurney <kgur...@mathworks.com> > wrote: > > > Thank you for starting this discussion, Andrew! > > > > Fiona, Sreehari, and I thought a bit about this, and I've summarized some > > of our thoughts below. > > > > Continue: > > > > 1. +1 to Will's suggestion about roadmaps for sub-projects. This is > > something that would be helpful for the MATLAB interface, for example. We > > would also be interested in the possibility of exploring a MATLAB sync > call > > if it would be of interest to other community members. > > > > 2. Continue focusing on building an inclusive developer community. Finish > > the work required to rename the master branch to main. Consider running > > automated checks on pull requests using a tool like alex [1] to prevent > use > > of inappropriate language and terminology. > > > > Start: > > > > 1. Add more visuals and diagrams to the documentation. It can be pretty > > overwhelming for new community members to look at the in-depth Arrow C++ > > documentation and be able to quickly get a high-level understanding of > how > > the various data structures (e.g. buffer, array, chunked array, record > > batch, table, field, schema, data type, etc.) relate to one another. > Having > > more visuals with clear labels that show the relationship between these > key > > concepts would be very helpful. This also applies to other parts of the > > documentation, like the CI systems (e.g. crossbow), which have a lot of > > moving parts. > > > > 2. Use pull request templates. This would hopefully make it easier for > > both new and existing contributors to describe their changes in a focused > > and clear way to others. For example, when making pull requests related > to > > the MATLAB interface, we've been trying to follow a fairly consistent > > pattern for pull request descriptions which includes sections like > > "Overview", "Implementation", "Testing", "Future Directions", "Notes", > etc. > > > > Stop: > > > > 1. +1 to Andrew's point about the reliance on a small number of core > > contributors for code reviews. Documenting a process for determining who > > should be included on a code review would be helpful. > > > > [1] https://github.com/get-alex/alex > > > > ________________________________ > > From: Dewey Dunnington <de...@voltrondata.com.INVALID> > > Sent: Tuesday, January 3, 2023 2:33 PM > > To: dev@arrow.apache.org <dev@arrow.apache.org> > > Subject: Re: [DISCUSS] State of the Arrow Project 2022 > > > > First, a +1000 on Will's blog post! [1] > > > > Continue: > > > > Building tools that benefit users of all languages, with particular kudos > > to ADBC for providing an ABI-stable way to write database drivers that > can > > be used by practitioners in C++, Ruby, Python, Java, Go, and (soon!) R. > > > > Start: > > > > I wonder if this is the year that we can find a way to write compute > > functions in such a way that separate implementations don't have to exist > > for C++, Go, and Rust (and maybe others I don't know about). > > > > Stop: > > > > Will's comment that we should stop building data scientist-facing tools > > under the Arrow name struck a particular chord with me...the R package is > > very much data scientist facing and we have a rather large disjoint > between > > the technical capacity of our users and the technical capacity required > to > > contribute to the package (e.g., maintaining a development Arrow C++ > > install). The types of things we have to do to make RecordBatchReader, > > Arrays, Buffer, RecordBatch and Table structures available to R users and > > the types of things we have to do to provide an Acero dplyr backend are > > vastly different. > > > > [1] https://www.datawill.io/posts/apache-arrow-2022-reflection/< > > https://www.datawill.io/posts/apache-arrow-2022-reflection> > > > > On Thu, Dec 29, 2022 at 4:09 PM Jacob Wujciak > > <ja...@voltrondata.com.invalid> > > wrote: > > > > > This is a great idea, I will add some thoughts later but just wanted to > > > quickly add that the Zulip Chat [1] was recently switched to allow > anyone > > > to register without the need for an invite link! > > > [1]: https://ursalabs.zulipchat.com/<https://ursalabs.zulipchat.com> > > > > > > > > > On Wed, Dec 28, 2022 at 11:27 PM Will Jones <will.jones...@gmail.com> > > > wrote: > > > > > > > Thanks for suggesting this Andrew. > > > > > > > > I just uploaded a blog post with my thoughts in long form [1]. Here > are > > > > some suggestions pulled from that: > > > > > > > > Continue: > > > > > > > > I hope we will continue prioritizing updating the spec for new array > > > > formats. [2] I think this is very important for avoiding > fragmentation > > > and > > > > may even open opportunities for consolidation in the C++ ecosystem. > > > > > > > > +1 on additional improvements for documentation, examples, no-invite > > > chats. > > > > I am particularly keen on seeing evangelism for our protocols; > existing > > > > ones like C Data Interface aren't nearly as widely known as they > ought > > to > > > > be and I'm excited for new ones like ADBC. > > > > > > > > Start: > > > > > > > > Find ways for each subproject to publicly develop a clear roadmap. > > > > Otherwise by default these discussions happen in private, either > > between > > > > individual ICs or within corporate environments. Some subprojects, > such > > > as > > > > Acero could likely use their own sync call to help facilitate this, > > even > > > if > > > > on a slower cadence than the main biweekly call. > > > > > > > > Also, other sync calls might consider adapting to the sync call note > > > style > > > > used in the Rust projects, where all notes are in one google doc [3] > > > rather > > > > than spread across main mailing list threads. That seems like a > format > > > that > > > > would make it easy for new contributors to catch up on the major > > focuses > > > of > > > > the project. > > > > > > > > Stop: > > > > > > > > Don't create end-user (e.g. data scientist) facing tools under the > name > > > > Arrow; prefer keeping separate brand identities for those tools and > > > keeping > > > > arrow libraries as developer-facing libraries. > > > > > > > > [1] https://www.datawill.io/posts/apache-arrow-2022-reflection/< > > https://www.datawill.io/posts/apache-arrow-2022-reflection/> > > > > [2] https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq > < > > https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq> > > > > [3] > > > > > > > > > > > > > > https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#heading=h.qkuvi08gk4qa > > < > > > https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#heading=h.qkuvi08gk4qa > > > > > > > > > > > On Mon, Dec 26, 2022 at 10:12 AM Andrew Lamb <al...@influxdata.com> > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I am very excited and honored to help steer the Arrow Project this > > year > > > > as > > > > > Arrow PMC Chair. > > > > > > > > > > Something Kou suggested, and the PMC thought would be valuable, is > to > > > > have > > > > > a small retrospective about the state of the project and where we > > want > > > to > > > > > take it. I would like to try doing so via a “state of the project” > > > type > > > > > discussion on this mailing list, inspired by an example from Apache > > > > Calcite > > > > > [1]. > > > > > > > > > > I welcome any / all comments on the following topics: What things / > > > > > activities, if any, do you you think the Apache Arrow Community > > should: > > > > > > > > > > 1. Continue > > > > > 2. Start > > > > > 3. Stop > > > > > > > > > > My thoughts are below. > > > > > > > > > > Andrew > > > > > > > > > > [1] > https://lists.apache.org/thread/tx8gw3vxc4kwfzjs6q2gqwgywnsm1zbf > > <https://lists.apache.org/thread/tx8gw3vxc4kwfzjs6q2gqwgywnsm1zbf> > > > > > > > > > > Continue: > > > > > > > > > > I hope we can continue to encourage and support community growth, > > > focused > > > > > especially on supporting the sub projects and their leadership. I > > also > > > > > would like to continue and grow the outward facing evangelism about > > the > > > > > project with blog posts and presentations. > > > > > > > > > > Start: > > > > > > > > > > Lower the barrier to contributors and accepting those contributions > > > even > > > > > more, especially for casual contributors. The move to github issues > > > from > > > > > JIRA I see as one example of lowering this barrier (by reducing the > > > > > required account maintenance). I would love to see additional > > > > improvements > > > > > in areas like documentation, examples, no-invite-needed chat, etc. > > > > > > > > > > Stop: > > > > > > > > > > It would be nice to stop (reduce) the reliance on the relatively > > small > > > > > number of core contributors for code review. I don’t have any > > > particular > > > > > insight on how to accomplish this, and suspect we will always have > > less > > > > > review capacity than we would like, but it would be nice to > encourage > > > the > > > > > growth. > > > > > > > > > > > > > > >