hi Jacob, On Tue, Mar 30, 2021 at 4:18 PM Jacob Quinn <quinn.jac...@gmail.com> wrote: > > I can comment as the primary apache arrow liaison for the Arrow.jl > repository and original code donator. > > I apologize for the "surprise", but I commented a few times in various > places and put a snippet in the README > <https://github.com/apache/arrow/tree/master/julia/Arrow#difference-between-this-code-and-the-juliadataarrowjl-repository> > about > the approach I wanted to take w/ the Julia implementation in terms of > keeping the JuliaData/Arrow.jl repository as a "dev branch" of sorts of the > apache/arrow code, upstreaming changes periodically. There's even a script > <https://github.com/JuliaData/Arrow.jl/blob/main/scripts/update_apache_arrow_code.jl> > I wrote to mostly automate this upstreaming. I realize now that I didn't > consider the "Arrow PMC" position on this kind of setup or seek to affirm > that it would be ok to approach things like this. > > The reality is that Julia users are very engrained to expect Julia packages > to live in a single stand-alone github repo, where issues can be opened, > and pull requests are welcome. It was hard and still is hard to imagine > "turning that off", since I believe we would lose a lot of valuable bug > reports and first-time contributions. This isn't necessarily any fault of > how the bug report/contribution process is handled for the arrow project > overall, though I'm also aware that there's a desire to make it easier > > <https://lists.apache.org/x/thread.html/r8817dfba08ef8daa210956db69d513fd27b7a751d28fb8f27e39cc7e@%3Cdev.arrow.apache.org%3E> > and > it currently requires more and different effort than Julia users are used > to. I think it's more from how open, welcoming, and how strong the culture > is in Julia around encouraging community contributions and the tight > integration with github and its open-source project management tools. >
Well, we are on track to having 1000 different people contribute to the project and have over 12,000 issues, so I don't think there is evidence that we are failing to attract new contributors or that feature requests / bugs aren't being reported. The way that we work is _different_, so adapting to the Apache process will require change. > Additionally, I was and still am concerned about the overall release > process of the apache/arrow project. I know there have been efforts there > as well to make it easier for individual languages to release on their own > cadence, but just anecdotally, the JuliaData/Arrow.jl has had/needed/wanted > 10 patch and minor releases since the original code donation, whereas the > apache/arrow project has had one (3.0.0). This leads to some of the > concerns I have with restricting development to just the apache/arrow > repository: how exactly does the release process work for individual > languages who may desire independent releases apart from the quarterly > overall project releases? I think from the Rust thread I remember that you > just need a group of language contributors to all agree, but what if I'm > the only "active" Julia contributor? It's also unclear what the > expectations are for actual development: with the original code donation > PRs, I know Neal "reviewed" the PRs, but perhaps missed the details around > how I proposed development continue going forward. Is it required to have a > certain number of reviews before merging? On the Julia side, I can try to > encourage/push for those who have contributed to the JuliaData/Arrow.jl > repository to help review PRs to apache/arrow, but I also can't guarantee > we would always have someone to review. It just feels pretty awkward if I > keep needing to ping non-Julia people to "review" a PR to merge it. Perhaps > this is just a problem of the overall Julia implementation "smallness" in > terms of contributors, but I'm not sure on the best answer here. > Several things here: * If you want to do separate Julia releases, you are free to do that, but you have to follow the process (voting on the mailing list, publishing GPG-signed source artifacts) * If you had been working "in the community" since November, you would probably already be a committer, so there is a bootstrapping here that has failed to take place. In the meantime, we are more than happy to help you "earn your wings" (as a committer) as quickly as possible. But from my perspective, I see a code donation and two other commits, which isn't enough to make a case for committership. > So in short, I'm not sure on the best path forward. I think strictly > restricting development to the apache/arrow physical repository would > actively hurt the progress of the Julia implementation, whereas it *has* > been progressing with increasing momentum since first released. There are > posts on the Julia discourse forum, in the Julia slack and zulip > communities, and quite a few issues/PRs being opened at the > JuliaData/Arrow.jl repository. There have been several calls for arrow > flight support, with a member from Julia Computing actually close to > releasing a gRPC client > <https://github.com/JuliaComputing/gRPCClient.jl> specifically > to help with flight support. But in terms of actual committers, it's been > primarily just myself, with a few minor contributions by others. > > I guess the big question that comes to mind is what are the hard > requirements to be considered an "official implementation"? Does the code > *have* to live in the same physical repo? Or if it passed the series of > archery integration tests, would that be enough? I apologize for my > naivete/inexperience on all things "apache", but I imagine that's a big > part of it: having official development/releases through the apache/arrow > community, though again I'm not exactly sure on the formal processes here? > I would like to keep Julia as an official implementation, but I'm also > mostly carrying the maintainership alone at the moment and want to be > realistic with the future of the project. > The critical matter is whether the development/maintenance work is conducted by the "Arrow community" in accordance with the Apache Way, which is to say individuals collaborating with each other on Apache channels (for communication and development) and avoiding the bad patterns you see sometimes in other communities (e.g. inconsistent openness). It's fine — really, no pressure — if you want to be independent and do things your own way, you just have to be clear that you are independent and not operating as part of the Apache Arrow community. You can't have it both ways, though. No hard feelings whatever you decide, but the current "dump code over the wall occasionally" approach but work on independent channels is not compatible. Building healthy open source communities is hard, but this way has been shown to work well, which is why I've spent the last 6 years working hard to bring people together to build this project and ecosystem! If you want to maintain a test harness here to verify an independent Julia implementation, that's fine, too. I'm disappointed that things failed to bootstrap after the code donation, so I want to see if we can course correct quickly or if not decide to go our separate ways. Thanks, Wes > I'm open to discussion and ideas on the best way forward. > > -Jacob > > On Tue, Mar 30, 2021 at 2:03 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi folks, > > > > I was very surprised today to learn that the Julia Arrow > > implementation has continued operating more or less like an > > independent open source project since the code donation last November: > > > > https://github.com/JuliaData/Arrow.jl/commits/main > > > > There may have been a misunderstanding about what was expected to > > occur after the code donation, but it's problematic for a bunch of > > reasons (IP lineage / governance / community development) to have work > > happening on the implementation "outside the community". > > > > In any case, what is done is done, so the Arrow PMC's position on this > > would be roughly to regard the work as a hard fork of what's in Apache > > Arrow, which given its development activity is more or less inactive > > [1]. (I had actually thought the project was simply inactive after the > > code donation) > > > > The critical question now is, is there interest from Julia developers > > in working "in the community", which is to say: > > > > * Having development discussions on ASF channels (mailing list, > > GitHub, JIRA), planning and communicating in the open > > * Doing all development in ASF GitHub repositories > > > > The answer to the question may be "no" (which is okay), but if that's > > the case, I don't think we should be giving the impression that we > > have an official Julia implementation that is developed and maintained > > by the community (and so my argument would be unfortunately to drop > > the donated code from the project). > > > > If the answer is "yes", there needs to be a hard commitment to move > > development to Apache channels and not look back. We would also need > > to figure out what to do to document and synchronize the new IP that's > > been created since the code donation. > > > > Thanks, > > Wes > > > > [1]: https://github.com/apache/arrow/commits/master/julia/Arrow > >