Hi Wes ! Thanks for your reply, all much clearer now. I guess it is just a question of getting used to it :-)
Remi Le mar. 14 avr. 2020 à 22:54, Wes McKinney <wesmck...@gmail.com> a écrit : > hi Remi, > > It's no problem, it's a common question we get. Some developers > believe as a matter of principle that large projects should be broken > up into many smaller repositories. > > Arrow is a different than many open source projects. Maintaining > protocol-level interoperability (although note that Rust does not yet > participate in the integration tests) has been a great deal of effort, > and the community has felt that trying to coordinate changes that > impact interoperability is substantially simpler in a monorepo > arrangement on GitHub. That we always know with relative certainty > whether any pull request may break interoperability between one > component and another. It's very easy to get into a situation where > you have a mess of cross-repository (or even circular) build and > runtime dependencies -- the monorepo makes all of this pain go away. > If you have a change that affects multiple repositories, CI tools > don't make it easy to test those PRs together, generally you'll just > see that a PR on one repo is breaking against the master of the other > repository. > > In some cases, components may not have integrations with other > languages but that may not always be the case in the future. We have > just developed the C interface, for example, which would enable > DataFusion to be built as a shared library and imported in Python (if > someone wanted to do that). > > Another dimension is that all of the PLs and components have benefited > greatly from the community's investment in CI and packaging > infrastructure. > > I also believe that the project's common PR queue helps create a sense > of community awareness and solidarity amongst projects contributors. > If Rust were working off in their own corner of GitHub, I think it > would be easy for people who are not working on Rust to ignore them. I > think the net result of the way that we currently operate is that > we're producing higher quality software and have a healthier community > than we would otherwise with a more fragmented approach. > > Lastly, the shared release cycle creates social pressure to get > patches finished and merged. Anecdotally this seems to be effective. > > On the governance questions, see the roles section on > https://www.apache.org/foundation/how-it-works.html#roles > > If a part of apache/arrow truly believed that they were being hindered > by being a part of monorepo, we could create a new repository under > apache/ on GitHub for the part that wants to split into a standalone > GitHub repository. That wouldn't change the governance of that code. > > - Wes > > On Tue, Apr 14, 2020 at 1:26 PM Rémi Dettai <rdet...@gmail.com> wrote: > > > > This is a follow up on https://issues.apache.org/jira/browse/ARROW-8451. > > > > First thanks for your answer! > > > > It's true that I was also surprised to see all implementations of Arrow > > mixed up in a single repository! > > > > I was really considering the separation of the repositories as a mean to > > separate concerns. I am not 100% sure to understand how it would fragment > > the community but I think I get the point, even though I still believe > that > > it is at the cost of extra complexity. > > > > As for the legal protection, I did not take that aspect into > consideration, > > and I find it very interesting! What is the PMC exactly and why would > > Datafusion be more exposed in a separate repository? >