Also, on the issue that there are no Julia-focused PMC members — note that I helped the JavaScript folks make their own independent releases for quite a while: called the votes (e.g. [1]), helped get people to verify and vote on the releases. After a time, it was decided to stop releasing independently because there wasn't enough development activity to justify it.
[1]: https://www.mail-archive.com/dev@arrow.apache.org/msg05971.html On Tue, Mar 30, 2021 at 4:54 PM Wes McKinney <wesmck...@gmail.com> wrote: > > hi Jacob, > > On Tue, Mar 30, 2021 at 4:18 PM Jacob Quinn <quinn.jac...@gmail.com> wrote: > > > > I can comment as the primary apache arrow liaison for the Arrow.jl > > repository and original code donator. > > > > I apologize for the "surprise", but I commented a few times in various > > places and put a snippet in the README > > <https://github.com/apache/arrow/tree/master/julia/Arrow#difference-between-this-code-and-the-juliadataarrowjl-repository> > > about > > the approach I wanted to take w/ the Julia implementation in terms of > > keeping the JuliaData/Arrow.jl repository as a "dev branch" of sorts of the > > apache/arrow code, upstreaming changes periodically. There's even a script > > <https://github.com/JuliaData/Arrow.jl/blob/main/scripts/update_apache_arrow_code.jl> > > I wrote to mostly automate this upstreaming. I realize now that I didn't > > consider the "Arrow PMC" position on this kind of setup or seek to affirm > > that it would be ok to approach things like this. > > > > The reality is that Julia users are very engrained to expect Julia packages > > to live in a single stand-alone github repo, where issues can be opened, > > and pull requests are welcome. It was hard and still is hard to imagine > > "turning that off", since I believe we would lose a lot of valuable bug > > reports and first-time contributions. This isn't necessarily any fault of > > how the bug report/contribution process is handled for the arrow project > > overall, though I'm also aware that there's a desire to make it easier > > > > > <https://lists.apache.org/x/thread.html/r8817dfba08ef8daa210956db69d513fd27b7a751d28fb8f27e39cc7e@%3Cdev.arrow.apache.org%3E> > > and > > it currently requires more and different effort than Julia users are used > > to. I think it's more from how open, welcoming, and how strong the culture > > is in Julia around encouraging community contributions and the tight > > integration with github and its open-source project management tools. > > > > Well, we are on track to having 1000 different people contribute to > the project and have over 12,000 issues, so I don't think there is > evidence that we are failing to attract new contributors or that > feature requests / bugs aren't being reported. The way that we work is > _different_, so adapting to the Apache process will require change. > > > Additionally, I was and still am concerned about the overall release > > process of the apache/arrow project. I know there have been efforts there > > as well to make it easier for individual languages to release on their own > > cadence, but just anecdotally, the JuliaData/Arrow.jl has had/needed/wanted > > 10 patch and minor releases since the original code donation, whereas the > > apache/arrow project has had one (3.0.0). This leads to some of the > > concerns I have with restricting development to just the apache/arrow > > repository: how exactly does the release process work for individual > > languages who may desire independent releases apart from the quarterly > > overall project releases? I think from the Rust thread I remember that you > > just need a group of language contributors to all agree, but what if I'm > > the only "active" Julia contributor? It's also unclear what the > > expectations are for actual development: with the original code donation > > PRs, I know Neal "reviewed" the PRs, but perhaps missed the details around > > how I proposed development continue going forward. Is it required to have a > > certain number of reviews before merging? On the Julia side, I can try to > > encourage/push for those who have contributed to the JuliaData/Arrow.jl > > repository to help review PRs to apache/arrow, but I also can't guarantee > > we would always have someone to review. It just feels pretty awkward if I > > keep needing to ping non-Julia people to "review" a PR to merge it. Perhaps > > this is just a problem of the overall Julia implementation "smallness" in > > terms of contributors, but I'm not sure on the best answer here. > > > > Several things here: > > * If you want to do separate Julia releases, you are free to do that, > but you have to follow the process (voting on the mailing list, > publishing GPG-signed source artifacts) > * If you had been working "in the community" since November, you would > probably already be a committer, so there is a bootstrapping here that > has failed to take place. In the meantime, we are more than happy to > help you "earn your wings" (as a committer) as quickly as possible. > But from my perspective, I see a code donation and two other commits, > which isn't enough to make a case for committership. > > > So in short, I'm not sure on the best path forward. I think strictly > > restricting development to the apache/arrow physical repository would > > actively hurt the progress of the Julia implementation, whereas it *has* > > been progressing with increasing momentum since first released. There are > > posts on the Julia discourse forum, in the Julia slack and zulip > > communities, and quite a few issues/PRs being opened at the > > JuliaData/Arrow.jl repository. There have been several calls for arrow > > flight support, with a member from Julia Computing actually close to > > releasing a gRPC client > > <https://github.com/JuliaComputing/gRPCClient.jl> specifically > > to help with flight support. But in terms of actual committers, it's been > > primarily just myself, with a few minor contributions by others. > > > > I guess the big question that comes to mind is what are the hard > > requirements to be considered an "official implementation"? Does the code > > *have* to live in the same physical repo? Or if it passed the series of > > archery integration tests, would that be enough? I apologize for my > > naivete/inexperience on all things "apache", but I imagine that's a big > > part of it: having official development/releases through the apache/arrow > > community, though again I'm not exactly sure on the formal processes here? > > I would like to keep Julia as an official implementation, but I'm also > > mostly carrying the maintainership alone at the moment and want to be > > realistic with the future of the project. > > > > The critical matter is whether the development/maintenance work is > conducted by the "Arrow community" in accordance with the Apache Way, > which is to say individuals collaborating with each other on Apache > channels (for communication and development) and avoiding the bad > patterns you see sometimes in other communities (e.g. inconsistent > openness). > > It's fine — really, no pressure — if you want to be independent and do > things your own way, you just have to be clear that you are > independent and not operating as part of the Apache Arrow community. > You can't have it both ways, though. No hard feelings whatever you > decide, but the current "dump code over the wall occasionally" > approach but work on independent channels is not compatible. Building > healthy open source communities is hard, but this way has been shown > to work well, which is why I've spent the last 6 years working hard to > bring people together to build this project and ecosystem! > > If you want to maintain a test harness here to verify an independent > Julia implementation, that's fine, too. I'm disappointed that things > failed to bootstrap after the code donation, so I want to see if we > can course correct quickly or if not decide to go our separate ways. > > Thanks, > Wes > > > I'm open to discussion and ideas on the best way forward. > > > > -Jacob > > > > On Tue, Mar 30, 2021 at 2:03 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > hi folks, > > > > > > I was very surprised today to learn that the Julia Arrow > > > implementation has continued operating more or less like an > > > independent open source project since the code donation last November: > > > > > > https://github.com/JuliaData/Arrow.jl/commits/main > > > > > > There may have been a misunderstanding about what was expected to > > > occur after the code donation, but it's problematic for a bunch of > > > reasons (IP lineage / governance / community development) to have work > > > happening on the implementation "outside the community". > > > > > > In any case, what is done is done, so the Arrow PMC's position on this > > > would be roughly to regard the work as a hard fork of what's in Apache > > > Arrow, which given its development activity is more or less inactive > > > [1]. (I had actually thought the project was simply inactive after the > > > code donation) > > > > > > The critical question now is, is there interest from Julia developers > > > in working "in the community", which is to say: > > > > > > * Having development discussions on ASF channels (mailing list, > > > GitHub, JIRA), planning and communicating in the open > > > * Doing all development in ASF GitHub repositories > > > > > > The answer to the question may be "no" (which is okay), but if that's > > > the case, I don't think we should be giving the impression that we > > > have an official Julia implementation that is developed and maintained > > > by the community (and so my argument would be unfortunately to drop > > > the donated code from the project). > > > > > > If the answer is "yes", there needs to be a hard commitment to move > > > development to Apache channels and not look back. We would also need > > > to figure out what to do to document and synchronize the new IP that's > > > been created since the code donation. > > > > > > Thanks, > > > Wes > > > > > > [1]: https://github.com/apache/arrow/commits/master/julia/Arrow > > >