Re: [DISCUSS] State of the Arrow Project 2022

Kevin Gurney Fri, 06 Jan 2023 12:26:15 -0800

Thank you for starting this discussion, Andrew!

Fiona, Sreehari, and I thought a bit about this, and I've summarized some of 
our thoughts below.

Continue:

1. +1 to Will's suggestion about roadmaps for sub-projects. This is something 
that would be helpful for the MATLAB interface, for example. We would also be 
interested in the possibility of exploring a MATLAB sync call if it would be of 
interest to other community members.

2. Continue focusing on building an inclusive developer community. Finish the 
work required to rename the master branch to main. Consider running automated 
checks on pull requests using a tool like alex [1] to prevent use of 
inappropriate language and terminology.

Start:

1. Add more visuals and diagrams to the documentation. It can be pretty 
overwhelming for new community members to look at the in-depth Arrow C++ 
documentation and be able to quickly get a high-level understanding of how the 
various data structures (e.g. buffer, array, chunked array, record batch, 
table, field, schema, data type, etc.) relate to one another. Having more 
visuals with clear labels that show the relationship between these key concepts 
would be very helpful. This also applies to other parts of the documentation, 
like the CI systems (e.g. crossbow), which have a lot of moving parts.

2. Use pull request templates. This would hopefully make it easier for both new 
and existing contributors to describe their changes in a focused and clear way 
to others. For example, when making pull requests related to the MATLAB 
interface, we've been trying to follow a fairly consistent pattern for pull 
request descriptions which includes sections like "Overview", "Implementation", 
"Testing", "Future Directions", "Notes", etc.

Stop:

1. +1 to Andrew's point about the reliance on a small number of core 
contributors for code reviews. Documenting a process for determining who should 
be included on a code review would be helpful.

[1] https://github.com/get-alex/alex

________________________________
From: Dewey Dunnington <de...@voltrondata.com.INVALID>
Sent: Tuesday, January 3, 2023 2:33 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [DISCUSS] State of the Arrow Project 2022

First, a +1000 on Will's blog post! [1]

Continue:

Building tools that benefit users of all languages, with particular kudos
to ADBC for providing an ABI-stable way to write database drivers that can
be used by practitioners in C++, Ruby, Python, Java, Go, and (soon!) R.

Start:

I wonder if this is the year that we can find a way to write compute
functions in such a way that separate implementations don't have to exist
for C++, Go, and Rust (and maybe others I don't know about).

Stop:

Will's comment that we should stop building data scientist-facing tools
under the Arrow name struck a particular chord with me...the R package is
very much data scientist facing and we have a rather large disjoint between
the technical capacity of our users and the technical capacity required to
contribute to the package (e.g., maintaining a development Arrow C++
install). The types of things we have to do to make RecordBatchReader,
Arrays, Buffer, RecordBatch and Table structures available to R users and
the types of things we have to do to provide an Acero dplyr backend are
vastly different.

[1] 
https://www.datawill.io/posts/apache-arrow-2022-reflection/<https://www.datawill.io/posts/apache-arrow-2022-reflection>

On Thu, Dec 29, 2022 at 4:09 PM Jacob Wujciak <ja...@voltrondata.com.invalid>
wrote:

> This is a great idea, I will add some thoughts later but just wanted to
> quickly add that the Zulip Chat [1] was recently switched to allow anyone
> to register without the need for an invite link!
> [1]: https://ursalabs.zulipchat.com/<https://ursalabs.zulipchat.com>
>
>
> On Wed, Dec 28, 2022 at 11:27 PM Will Jones <will.jones...@gmail.com>
> wrote:
>
> > Thanks for suggesting this Andrew.
> >
> > I just uploaded a blog post with my thoughts in long form [1]. Here are
> > some suggestions pulled from that:
> >
> > Continue:
> >
> > I hope we will continue prioritizing updating the spec for new array
> > formats. [2] I think this is very important for avoiding fragmentation
> and
> > may even open opportunities for consolidation in the C++ ecosystem.
> >
> > +1 on additional improvements for documentation, examples, no-invite
> chats.
> > I am particularly keen on seeing evangelism for our protocols; existing
> > ones like C Data Interface aren't nearly as widely known as they ought to
> > be and I'm excited for new ones like ADBC.
> >
> > Start:
> >
> > Find ways for each subproject to publicly develop a clear roadmap.
> > Otherwise by default these discussions happen in private, either between
> > individual ICs or within corporate environments. Some subprojects, such
> as
> > Acero could likely use their own sync call to help facilitate this, even
> if
> > on a slower cadence than the main biweekly call.
> >
> > Also, other sync calls might consider adapting to the sync call note
> style
> > used in the Rust projects, where all notes are in one google doc [3]
> rather
> > than spread across main mailing list threads. That seems like a format
> that
> > would make it easy for new contributors to catch up on the major focuses
> of
> > the project.
> >
> > Stop:
> >
> > Don't create end-user (e.g. data scientist) facing tools under the name
> > Arrow; prefer keeping separate brand identities for those tools and
> keeping
> > arrow libraries as developer-facing libraries.
> >
> > [1] 
> > https://www.datawill.io/posts/apache-arrow-2022-reflection/<https://www.datawill.io/posts/apache-arrow-2022-reflection/>
> > [2] 
> > https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq<https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq>
> > [3]
> >
> >
> https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#heading=h.qkuvi08gk4qa<https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#heading=h.qkuvi08gk4qa>
> >
> > On Mon, Dec 26, 2022 at 10:12 AM Andrew Lamb <al...@influxdata.com>
> wrote:
> >
> > > Hi all,
> > >
> > > I am very excited and honored to help steer the Arrow Project this year
> > as
> > > Arrow PMC Chair.
> > >
> > > Something Kou suggested, and the PMC thought would be valuable, is to
> > have
> > > a small retrospective about the state of the project and where we want
> to
> > > take it. I would like to try doing so via a “state of the project”
> type
> > > discussion on this mailing list, inspired by an example from Apache
> > Calcite
> > > [1].
> > >
> > > I welcome any / all comments on the following topics: What things /
> > > activities, if any, do you you think the Apache Arrow Community should:
> > >
> > > 1. Continue
> > > 2. Start
> > > 3. Stop
> > >
> > > My thoughts are below.
> > >
> > > Andrew
> > >
> > > [1] 
> > > https://lists.apache.org/thread/tx8gw3vxc4kwfzjs6q2gqwgywnsm1zbf<https://lists.apache.org/thread/tx8gw3vxc4kwfzjs6q2gqwgywnsm1zbf>
> > >
> > > Continue:
> > >
> > > I hope we can continue to encourage and support community growth,
> focused
> > > especially on supporting the sub projects and their leadership. I also
> > > would like to continue and grow the outward facing evangelism about the
> > > project with blog posts and presentations.
> > >
> > > Start:
> > >
> > > Lower the barrier to contributors and accepting those contributions
> even
> > > more, especially for casual contributors. The move to github issues
> from
> > > JIRA I see as one example of lowering this barrier (by reducing the
> > > required account maintenance). I would love to see additional
> > improvements
> > > in areas like documentation, examples, no-invite-needed chat, etc.
> > >
> > > Stop:
> > >
> > > It would be nice to stop (reduce) the reliance on the relatively small
> > > number of core contributors for code review. I don’t have any
> particular
> > > insight on how to accomplish this, and suspect we will always have less
> > > review capacity than we would like, but it would be nice to encourage
> the
> > > growth.
> > >
> >
>

Re: [DISCUSS] State of the Arrow Project 2022

Reply via email to