Renjie, you're amazing. I think you summarized this better than I could, so thank you for that.
I'd like to pull in a user's feedback on Slack FWIW, I’m personally a fan of separate repos for the client libraries. > It keeps things more a bit more isolated (in a good way) and explorable > (rather than overwhelming). GitHub search is a bit easier to use. And I > think it generally lowers the bar to contributing. Independent versioning, > and GitHub releases are a big win too, I think. > Right now, I don’t actually know where to find PyIceberg release notes. > Would love to see release notes in the GitHub releases for them. IMO, The most important measurement of success for choosing either of these options is about making the contributor experience as smooth as possible. Monorepo has the advantage of one place to look, all changes across core/clients can be modeled in a single PR, and sharing resources. At first, I considered managing the build to only be a problem for Iceberg committers managing the build, but ultimately this is setting us up for a longer build and running unnecessary infrastructure for unrelated tasks. There is definitely ways that we can verify what parts of the code have been changed and which code should be run, but it will not always be clear or simple to know if we tested too much or not enough. For that, I am also in the multi-repo camp (for clients). I think despite having to manage different repos for each client, I generally consider the work of each client to be independent of the work happening in the main repo. In this view, it's possibly better that the work be independent and seen on its own. The biggest win IMO is the intentional separation of testing and deployment infrastructure. This will make for a better experience when folks are contributing, testing, and looking for release notes. But I also really don't care as long as we do the same things across clients. ;) Bits On Thu, Aug 10, 2023 at 2:38 AM Renjie Liu <liurenjie2...@gmail.com> wrote: > Hi, all: > > > > In yesterday’s community sync we talked about the location of different > language clients, and I think we all agree that there should be consistent > behavior for these clients, but the decision has not been made yet. I want > to continue the discussion here on the pros and cons of different sides: > mono repo(all in one big repo) or multi small repos( one for each language > client) > > > > To make things clear, currently we have four language libraries under > development: > > > > 1. Java: in main repo(https://github.com/apache/iceberg) > 2. Python: in main repo (https://github.com/apache/iceberg) > 3. Go: in main repo (https://github.com/apache/iceberg) > 4. Rust: in standalone repo (https://github.com/apache/iceberg-rust/) > > > > Currently I mainly contribute rust client and I can share the thoughts on > why I voted for standalone repo: > > > > 1. Easier project setup. Iceberg is a complex project with several > components, and mainly written in java. As someone not quite familiar with > this project structure, I feel easier to start a new one rather fitting > into an existing one. > 2. Faster ci workflow. In early days of rust client’s development, we > only need to touch rust related code. If we all live in one mono repo, it > will trigger unnecessary ci to run for other components. > > > > I admit that these reasons may not stand for long term maintains, but it’s > good for fast-paced development in early days. > > > > After reviewing some discussions on the web, I have a summary about the > pros and cons of two sides: > > > > Mono Repo > > > > Pros > > - *Visibility and transparency*. It would be easier to follow > progresses of all clients, and prs can have more reviews and attractions. > - *Easier sharing of resources*. It would be easier to share resources > for integration tests. > > Cons > > - *Increases complexity of project structure*. The project structure > would be more complex when coupling different languages and toolchain > setup. > - *Longer build/ci time. *Unnecessary ci checks maybe triggered for > small prs in different languages. > > > > Multi Repo > > > > Pros > > - *Simplifies project structure*. Different language may have > toolchains and project setup, one repo for one language makes project > structure easier to understand and follow. > - *Independent versioning and releases*. Different language may have > different versioning and releases process. It’s also possible in monorepo, > but I guess it would be easier in standalone multi repo. > - *Improved build/ci time*. No unnecessary ci checks will be triggered. > > Cons > > - *Difficult to track the overall progress. *Multi repos makes it > harder to track what’s happening in different teams. > - *Difficult to share common resources.* It maybe more difficult to > share resources and do integration tests cross different languages. > > > > > > Welcome to share your ideas and thoughts in this discussion! > > > > References > > > > 1. > > https://www.coforge.com/blog/mono-repo-vs.-multi-repo-in-git-unravelling-the-key-differences > >