Also, on the issue that there are no Julia-focused PMC members — note
that I helped the JavaScript folks make their own independent releases
for quite a while: called the votes (e.g. [1]), helped get people to
verify and vote on the releases. After a time, it was decided to stop
releasing independently because there wasn't enough development
activity to justify it.

[1]: https://www.mail-archive.com/dev@arrow.apache.org/msg05971.html

On Tue, Mar 30, 2021 at 4:54 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> hi Jacob,
>
> On Tue, Mar 30, 2021 at 4:18 PM Jacob Quinn <quinn.jac...@gmail.com> wrote:
> >
> > I can comment as the primary apache arrow liaison for the Arrow.jl
> > repository and original code donator.
> >
> > I apologize for the "surprise", but I commented a few times in various
> > places and put a snippet in the README
> > <https://github.com/apache/arrow/tree/master/julia/Arrow#difference-between-this-code-and-the-juliadataarrowjl-repository>
> > about
> > the approach I wanted to take w/ the Julia implementation in terms of
> > keeping the JuliaData/Arrow.jl repository as a "dev branch" of sorts of the
> > apache/arrow code, upstreaming changes periodically. There's even a script
> > <https://github.com/JuliaData/Arrow.jl/blob/main/scripts/update_apache_arrow_code.jl>
> > I wrote to mostly automate this upstreaming. I realize now that I didn't
> > consider the "Arrow PMC" position on this kind of setup or seek to affirm
> > that it would be ok to approach things like this.
> >
> > The reality is that Julia users are very engrained to expect Julia packages
> > to live in a single stand-alone github repo, where issues can be opened,
> > and pull requests are welcome. It was hard and still is hard to imagine
> > "turning that off", since I believe we would lose a lot of valuable bug
> > reports and first-time contributions. This isn't necessarily any fault of
> > how the bug report/contribution process is handled for the arrow project
> > overall, though I'm also aware that there's a desire to make it easier
> >
> >
> <https://lists.apache.org/x/thread.html/r8817dfba08ef8daa210956db69d513fd27b7a751d28fb8f27e39cc7e@%3Cdev.arrow.apache.org%3E>
> > and
> > it currently requires more and different effort than Julia users are used
> > to. I think it's more from how open, welcoming, and how strong the culture
> > is in Julia around encouraging community contributions and the tight
> > integration with github and its open-source project management tools.
> >
>
> Well, we are on track to having 1000 different people contribute to
> the project and have over 12,000 issues, so I don't think there is
> evidence that we are failing to attract new contributors or that
> feature requests / bugs aren't being reported. The way that we work is
> _different_, so adapting to the Apache process will require change.
>
> > Additionally, I was and still am concerned about the overall release
> > process of the apache/arrow project. I know there have been efforts there
> > as well to make it easier for individual languages to release on their own
> > cadence, but just anecdotally, the JuliaData/Arrow.jl has had/needed/wanted
> > 10 patch and minor releases since the original code donation, whereas the
> > apache/arrow project has had one (3.0.0). This leads to some of the
> > concerns I have with restricting development to just the apache/arrow
> > repository: how exactly does the release process work for individual
> > languages who may desire independent releases apart from the quarterly
> > overall project releases? I think from the Rust thread I remember that you
> > just need a group of language contributors to all agree, but what if I'm
> > the only "active" Julia contributor? It's also unclear what the
> > expectations are for actual development: with the original code donation
> > PRs, I know Neal "reviewed" the PRs, but perhaps missed the details around
> > how I proposed development continue going forward. Is it required to have a
> > certain number of reviews before merging? On the Julia side, I can try to
> > encourage/push for those who have contributed to the JuliaData/Arrow.jl
> > repository to help review PRs to apache/arrow, but I also can't guarantee
> > we would always have someone to review. It just feels pretty awkward if I
> > keep needing to ping non-Julia people to "review" a PR to merge it. Perhaps
> > this is just a problem of the overall Julia implementation "smallness" in
> > terms of contributors, but I'm not sure on the best answer here.
> >
>
> Several things here:
>
> * If you want to do separate Julia releases, you are free to do that,
> but you have to follow the process (voting on the mailing list,
> publishing GPG-signed source artifacts)
> * If you had been working "in the community" since November, you would
> probably already be a committer, so there is a bootstrapping here that
> has failed to take place. In the meantime, we are more than happy to
> help you "earn your wings" (as a committer) as quickly as possible.
> But from my perspective, I see a code donation and two other commits,
> which isn't enough to make a case for committership.
>
> > So in short, I'm not sure on the best path forward. I think strictly
> > restricting development to the apache/arrow physical repository would
> > actively hurt the progress of the Julia implementation, whereas it *has*
> > been progressing with increasing momentum since first released. There are
> > posts on the Julia discourse forum, in the Julia slack and zulip
> > communities, and quite a few issues/PRs being opened at the
> > JuliaData/Arrow.jl repository. There have been several calls for arrow
> > flight support, with a member from Julia Computing actually close to
> > releasing a gRPC client
> > <https://github.com/JuliaComputing/gRPCClient.jl> specifically
> > to help with flight support. But in terms of actual committers, it's been
> > primarily just myself, with a few minor contributions by others.
> >
> > I guess the big question that comes to mind is what are the hard
> > requirements to be considered an "official implementation"? Does the code
> > *have* to live in the same physical repo? Or if it passed the series of
> > archery integration tests, would that be enough? I apologize for my
> > naivete/inexperience on all things "apache", but I imagine that's a big
> > part of it: having official development/releases through the apache/arrow
> > community, though again I'm not exactly sure on the formal processes here?
> > I would like to keep Julia as an official implementation, but I'm also
> > mostly carrying the maintainership alone at the moment and want to be
> > realistic with the future of the project.
> >
>
> The critical matter is whether the development/maintenance work is
> conducted by the "Arrow community" in accordance with the Apache Way,
> which is to say individuals collaborating with each other on Apache
> channels (for communication and development) and avoiding the bad
> patterns you see sometimes in other communities (e.g. inconsistent
> openness).
>
> It's fine — really, no pressure — if you want to be independent and do
> things your own way, you just have to be clear that you are
> independent and not operating as part of the Apache Arrow community.
> You can't have it both ways, though. No hard feelings whatever you
> decide, but the current "dump code over the wall occasionally"
> approach but work on independent channels is not compatible. Building
> healthy open source communities is hard, but this way has been shown
> to work well, which is why I've spent the last 6 years working hard to
> bring people together to build this project and ecosystem!
>
> If you want to maintain a test harness here to verify an independent
> Julia implementation, that's fine, too. I'm disappointed that things
> failed to bootstrap after the code donation, so I want to see if we
> can course correct quickly or if not decide to go our separate ways.
>
> Thanks,
> Wes
>
> > I'm open to discussion and ideas on the best way forward.
> >
> > -Jacob
> >
> > On Tue, Mar 30, 2021 at 2:03 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > > hi folks,
> > >
> > > I was very surprised today to learn that the Julia Arrow
> > > implementation has continued operating more or less like an
> > > independent open source project since the code donation last November:
> > >
> > > https://github.com/JuliaData/Arrow.jl/commits/main
> > >
> > > There may have been a misunderstanding about what was expected to
> > > occur after the code donation, but it's problematic for a bunch of
> > > reasons (IP lineage / governance / community development) to have work
> > > happening on the implementation "outside the community".
> > >
> > > In any case, what is done is done, so the Arrow PMC's position on this
> > > would be roughly to regard the work as a hard fork of what's in Apache
> > > Arrow, which given its development activity is more or less inactive
> > > [1]. (I had actually thought the project was simply inactive after the
> > > code donation)
> > >
> > > The critical question now is, is there interest from Julia developers
> > > in working "in the community", which is to say:
> > >
> > > * Having development discussions on ASF channels (mailing list,
> > > GitHub, JIRA), planning and communicating in the open
> > > * Doing all development in ASF GitHub repositories
> > >
> > > The answer to the question may be "no" (which is okay), but if that's
> > > the case, I don't think we should be giving the impression that we
> > > have an official Julia implementation that is developed and maintained
> > > by the community (and so my argument would be unfortunately to drop
> > > the donated code from the project).
> > >
> > > If the answer is "yes", there needs to be a hard commitment to move
> > > development to Apache channels and not look back. We would also need
> > > to figure out what to do to document and synchronize the new IP that's
> > > been created since the code donation.
> > >
> > > Thanks,
> > > Wes
> > >
> > > [1]: https://github.com/apache/arrow/commits/master/julia/Arrow
> > >

Reply via email to