Hi Wes !

Thanks for your reply, all much clearer now. I guess it is just a question
of getting used to it :-)

Remi

Le mar. 14 avr. 2020 à 22:54, Wes McKinney <wesmck...@gmail.com> a écrit :

> hi Remi,
>
> It's no problem, it's a common question we get. Some developers
> believe as a matter of principle that large projects should be broken
> up into many smaller repositories.
>
> Arrow is a different than many open source projects. Maintaining
> protocol-level interoperability (although note that Rust does not yet
> participate in the integration tests) has been a great deal of effort,
> and the community has felt that trying to coordinate changes that
> impact interoperability is substantially simpler in a monorepo
> arrangement on GitHub. That we always know with relative certainty
> whether any pull request may break interoperability between one
> component and another. It's very easy to get into a situation where
> you have a mess of cross-repository (or even circular) build and
> runtime dependencies -- the monorepo makes all of this pain go away.
> If you have a change that affects multiple repositories, CI tools
> don't make it easy to test those PRs together, generally you'll just
> see that a PR on one repo is breaking against the master of the other
> repository.
>
> In some cases, components may not have integrations with other
> languages but that may not always be the case in the future. We have
> just developed the C interface, for example, which would enable
> DataFusion to be built as a shared library and imported in Python (if
> someone wanted to do that).
>
> Another dimension is that all of the PLs and components have benefited
> greatly from the community's investment in CI and packaging
> infrastructure.
>
> I also believe that the project's common PR queue helps create a sense
> of community awareness and solidarity amongst projects contributors.
> If Rust were working off in their own corner of GitHub, I think it
> would be easy for people who are not working on Rust to ignore them. I
> think the net result of the way that we currently operate is that
> we're producing higher quality software and have a healthier community
> than we would otherwise with a more fragmented approach.
>
> Lastly, the shared release cycle creates social pressure to get
> patches finished and merged. Anecdotally this seems to be effective.
>
> On the governance questions, see the roles section on
> https://www.apache.org/foundation/how-it-works.html#roles
>
> If a part of apache/arrow truly believed that they were being hindered
> by being a part of monorepo, we could create a new repository under
> apache/ on GitHub for the part that wants to split into a standalone
> GitHub repository. That wouldn't change the governance of that code.
>
> - Wes
>
> On Tue, Apr 14, 2020 at 1:26 PM Rémi Dettai <rdet...@gmail.com> wrote:
> >
> > This is a follow up on https://issues.apache.org/jira/browse/ARROW-8451.
> >
> > First thanks for your answer!
> >
> > It's true that I was also surprised to see all implementations of Arrow
> > mixed up in a single repository!
> >
> > I was really considering the separation of the repositories as a mean to
> > separate concerns. I am not 100% sure to understand how it would fragment
> > the community but I think I get the point, even though I still believe
> that
> > it is at the cost of extra complexity.
> >
> > As for the legal protection, I did not take that aspect into
> consideration,
> > and I find it very interesting! What is the PMC exactly and why would
> > Datafusion be more exposed in a separate repository?
>

Reply via email to