I wasn't at the discussion on Wednesday, but it sounds like there is support for moving to separate repos. Does anyone strongly object?
I also agree with Steven on not renaming to iceberg-java. That's the repo where we keep the spec and Java is the reference implementation. Plus we don't want to break a ton of links. Ryan On Thu, Aug 10, 2023 at 1:05 PM Steven Wu <stevenz...@gmail.com> wrote: > I am also on the side of separate repos for different languages. > otherwise, the main repo can grow too big. iceberg.apache.org website can > provide proper links to repos for different languages. > > I would be -1 on renaming apache/iceberg to apache/iceberg-java, as it can > break external links to the main/original github repo. the tradeoff may not > be worth it. > > On Thu, Aug 10, 2023 at 8:16 AM Fokko Driesprong <fo...@apache.org> wrote: > >> Hi everyone, >> >> Today I took a stab at the generation of wheels in Python (here's the PR >> <https://github.com/apache/iceberg/pull/8287> if anyone is interested), >> and when testing this it would also kick off many unrelated CI jobs. This >> is just for two languages, and I'm not convinced that it will scale to many >> languages. Also, having a different release cycle for each of the languages >> will clutter up the tags, releases, etc. I'm convinced that >> separate repositories are more scalable in the future, we just have to make >> sure that they can be found easily (rename apache/iceberg to >> apache/iceberg-java?). >> >> Cheers, Fokko >> >> >> >> Op do 10 aug 2023 om 14:18 schreef Jan Kaul <jank...@mailbox.org.invalid >> >: >> >>> Hi all, >>> >>> first off, thanks Brian for starting the conversation and thanks Renjie >>> for the write up. >>> >>> I'm also in the camp multi-repo because of the already mentioned >>> benefits. >>> >>> One point I would like to add is that the potential drawback of having >>> less visibility with multi-repos can be mitigated to some extent. I think >>> that if the different repos are clearly and visibly presented on the >>> iceberg website people should be able to find the desired implementation. >>> >>> Best wishes, >>> >>> Jan >>> On 10.08.23 13:43, Brian Olsen wrote: >>> >>> Renjie, you're amazing. >>> >>> I think you summarized this better than I could, so thank you for that. >>> >>> I'd like to pull in a user's feedback on Slack >>> >>> FWIW, I’m personally a fan of separate repos for the client libraries. >>>> It keeps things more a bit more isolated (in a good way) and explorable >>>> (rather than overwhelming). GitHub search is a bit easier to use. And I >>>> think it generally lowers the bar to contributing. Independent versioning, >>>> and GitHub releases are a big win too, I think. >>>> >>> >>> Right now, I don’t actually know where to find PyIceberg release notes. >>>> Would love to see release notes in the GitHub releases for them. >>> >>> >>> >>> IMO, The most important measurement of success for choosing either of >>> these options is about making the contributor experience as smooth as >>> possible. >>> >>> Monorepo has the advantage of one place to look, all changes across >>> core/clients can be modeled in a single PR, and sharing resources. At >>> first, I considered managing the build to only be a problem for Iceberg >>> committers managing the build, but ultimately this is setting us up for a >>> longer build and running unnecessary infrastructure for unrelated tasks. >>> There is definitely ways that we can verify what parts of the code have >>> been changed and which code should be run, but it will not always be clear >>> or simple to know if we tested too much or not enough. >>> >>> For that, I am also in the multi-repo camp (for clients). I think >>> despite having to manage different repos for each client, I generally >>> consider the work of each client to be independent of the work happening in >>> the main repo. In this view, it's possibly better that the work be >>> independent and seen on its own. The biggest win IMO is the intentional >>> separation of testing and deployment infrastructure. This will make for a >>> better experience when folks are contributing, testing, and looking for >>> release notes. >>> >>> But I also really don't care as long as we do the same things across >>> clients. ;) >>> >>> Bits >>> >>> >>> On Thu, Aug 10, 2023 at 2:38 AM Renjie Liu <liurenjie2...@gmail.com> >>> wrote: >>> >>>> Hi, all: >>>> >>>> >>>> >>>> In yesterday’s community sync we talked about the location of different >>>> language clients, and I think we all agree that there should be consistent >>>> behavior for these clients, but the decision has not been made yet. I want >>>> to continue the discussion here on the pros and cons of different sides: >>>> mono repo(all in one big repo) or multi small repos( one for each language >>>> client) >>>> >>>> >>>> >>>> To make things clear, currently we have four language libraries under >>>> development: >>>> >>>> >>>> >>>> 1. Java: in main repo(https://github.com/apache/iceberg) >>>> 2. Python: in main repo (https://github.com/apache/iceberg) >>>> 3. Go: in main repo (https://github.com/apache/iceberg) >>>> 4. Rust: in standalone repo (https://github.com/apache/iceberg-rust/ >>>> ) >>>> >>>> >>>> >>>> Currently I mainly contribute rust client and I can share the thoughts >>>> on why I voted for standalone repo: >>>> >>>> >>>> >>>> 1. Easier project setup. Iceberg is a complex project with several >>>> components, and mainly written in java. As someone not quite familiar >>>> with >>>> this project structure, I feel easier to start a new one rather fitting >>>> into an existing one. >>>> 2. Faster ci workflow. In early days of rust client’s development, >>>> we only need to touch rust related code. If we all live in one mono >>>> repo, >>>> it will trigger unnecessary ci to run for other components. >>>> >>>> >>>> >>>> I admit that these reasons may not stand for long term maintains, but >>>> it’s good for fast-paced development in early days. >>>> >>>> >>>> >>>> After reviewing some discussions on the web, I have a summary about the >>>> pros and cons of two sides: >>>> >>>> >>>> >>>> Mono Repo >>>> >>>> >>>> >>>> Pros >>>> >>>> - *Visibility and transparency*. It would be easier to follow >>>> progresses of all clients, and prs can have more reviews and >>>> attractions. >>>> - *Easier sharing of resources*. It would be easier to share >>>> resources for integration tests. >>>> >>>> Cons >>>> >>>> - *Increases complexity of project structure*. The project >>>> structure would be more complex when coupling different languages and >>>> toolchain setup. >>>> - *Longer build/ci time. *Unnecessary ci checks maybe triggered >>>> for small prs in different languages. >>>> >>>> >>>> >>>> Multi Repo >>>> >>>> >>>> >>>> Pros >>>> >>>> - *Simplifies project structure*. Different language may have >>>> toolchains and project setup, one repo for one language makes project >>>> structure easier to understand and follow. >>>> - *Independent versioning and releases*. Different language may >>>> have different versioning and releases process. It’s also possible in >>>> monorepo, but I guess it would be easier in standalone multi repo. >>>> - *Improved build/ci time*. No unnecessary ci checks will be >>>> triggered. >>>> >>>> Cons >>>> >>>> - *Difficult to track the overall progress. *Multi repos makes it >>>> harder to track what’s happening in different teams. >>>> - *Difficult to share common resources.* It maybe more difficult to >>>> share resources and do integration tests cross different languages. >>>> >>>> >>>> >>>> >>>> >>>> Welcome to share your ideas and thoughts in this discussion! >>>> >>>> >>>> >>>> References >>>> >>>> >>>> >>>> 1. >>>> >>>> https://www.coforge.com/blog/mono-repo-vs.-multi-repo-in-git-unravelling-the-key-differences >>>> >>>> -- Ryan Blue Tabular