Hi, all: In yesterday’s community sync we talked about the location of different language clients, and I think we all agree that there should be consistent behavior for these clients, but the decision has not been made yet. I want to continue the discussion here on the pros and cons of different sides: mono repo(all in one big repo) or multi small repos( one for each language client)
To make things clear, currently we have four language libraries under development: 1. Java: in main repo(https://github.com/apache/iceberg) 2. Python: in main repo (https://github.com/apache/iceberg) 3. Go: in main repo (https://github.com/apache/iceberg) 4. Rust: in standalone repo (https://github.com/apache/iceberg-rust/) Currently I mainly contribute rust client and I can share the thoughts on why I voted for standalone repo: 1. Easier project setup. Iceberg is a complex project with several components, and mainly written in java. As someone not quite familiar with this project structure, I feel easier to start a new one rather fitting into an existing one. 2. Faster ci workflow. In early days of rust client’s development, we only need to touch rust related code. If we all live in one mono repo, it will trigger unnecessary ci to run for other components. I admit that these reasons may not stand for long term maintains, but it’s good for fast-paced development in early days. After reviewing some discussions on the web, I have a summary about the pros and cons of two sides: Mono Repo Pros * Visibility and transparency. It would be easier to follow progresses of all clients, and prs can have more reviews and attractions. * Easier sharing of resources. It would be easier to share resources for integration tests. Cons * Increases complexity of project structure. The project structure would be more complex when coupling different languages and toolchain setup. * Longer build/ci time. Unnecessary ci checks maybe triggered for small prs in different languages. Multi Repo Pros * Simplifies project structure. Different language may have toolchains and project setup, one repo for one language makes project structure easier to understand and follow. * Independent versioning and releases. Different language may have different versioning and releases process. It’s also possible in monorepo, but I guess it would be easier in standalone multi repo. * Improved build/ci time. No unnecessary ci checks will be triggered. Cons * Difficult to track the overall progress. Multi repos makes it harder to track what’s happening in different teams. * Difficult to share common resources. It maybe more difficult to share resources and do integration tests cross different languages. Welcome to share your ideas and thoughts in this discussion! References 1. https://www.coforge.com/blog/mono-repo-vs.-multi-repo-in-git-unravelling-the-key-differences