Thanks for your effort. On Fri, Aug 11, 2023 at 14:17 Fokko Driesprong <fo...@apache.org> wrote:
> I should have mentioned that Github does automatic redirection when you do > a rename of a repository. But you're all right, the impact is possibly > bigger than we can envision and it is probably not worth it. > > I took the liberty of creating iceberg-python > <https://github.com/apache/icebergp-python> and iceberg-go > <https://github.com/apache/iceberg-go>. For Python, I'd love to do a > release this month, I think right after that (hopefully most PRs are in), > it would be a good moment to split out the Python part from the Java > repository. > > Kind regards, > Fokko > > > Op vr 11 aug 2023 om 04:08 schreef Renjie Liu <liurenjie2...@gmail.com>: > >> Thanks everyone for nice discussion. >> >> >> >> +1 for multi repo while keeping core spec and java implementation in >> apache/iceberg. Currently java is still most widely adopted and >> sophisticated implementation. We only need to help people to find other >> implementation by providing links in web page. >> >> >> >> >> >> *From: *Ryan Blue <b...@tabular.io> >> *Date: *Friday, August 11, 2023 at 05:23 >> *To: *dev@iceberg.apache.org <dev@iceberg.apache.org> >> *Subject: *Re: Discussion about the location of language clients >> >> I wasn't at the discussion on Wednesday, but it sounds like there is >> support for moving to separate repos. Does anyone strongly object? >> >> >> >> I also agree with Steven on not renaming to iceberg-java. That's the repo >> where we keep the spec and Java is the reference implementation. Plus we >> don't want to break a ton of links. >> >> >> >> Ryan >> >> >> >> On Thu, Aug 10, 2023 at 1:05 PM Steven Wu <stevenz...@gmail.com> wrote: >> >> I am also on the side of separate repos for different languages. >> otherwise, the main repo can grow too big. iceberg.apache.org website >> can provide proper links to repos for different languages. >> >> >> >> I would be -1 on renaming apache/iceberg to apache/iceberg-java, as it >> can break external links to the main/original github repo. the tradeoff may >> not be worth it. >> >> >> >> On Thu, Aug 10, 2023 at 8:16 AM Fokko Driesprong <fo...@apache.org> >> wrote: >> >> Hi everyone, >> >> >> >> Today I took a stab at the generation of wheels in Python (here's the PR >> <https://github.com/apache/iceberg/pull/8287> if anyone is interested), >> and when testing this it would also kick off many unrelated CI jobs. This >> is just for two languages, and I'm not convinced that it will scale to many >> languages. Also, having a different release cycle for each of the languages >> will clutter up the tags, releases, etc. I'm convinced that >> separate repositories are more scalable in the future, we just have to make >> sure that they can be found easily (rename apache/iceberg to >> apache/iceberg-java?). >> >> >> >> Cheers, Fokko >> >> >> >> >> >> >> >> Op do 10 aug 2023 om 14:18 schreef Jan Kaul <jank...@mailbox.org.invalid >> >: >> >> Hi all, >> >> first off, thanks Brian for starting the conversation and thanks Renjie >> for the write up. >> >> I'm also in the camp multi-repo because of the already mentioned >> benefits. >> >> One point I would like to add is that the potential drawback of having >> less visibility with multi-repos can be mitigated to some extent. I think >> that if the different repos are clearly and visibly presented on the >> iceberg website people should be able to find the desired implementation. >> >> Best wishes, >> >> Jan >> >> On 10.08.23 13:43, Brian Olsen wrote: >> >> Renjie, you're amazing. >> >> I think you summarized this better than I could, so thank you for that. >> >> I'd like to pull in a user's feedback on Slack >> >> FWIW, I’m personally a fan of separate repos for the client libraries. >> It keeps things more a bit more isolated (in a good way) and explorable >> (rather than overwhelming). GitHub search is a bit easier to use. And I >> think it generally lowers the bar to contributing. Independent versioning, >> and GitHub releases are a big win too, I think. >> >> >> >> Right now, I don’t actually know where to find PyIceberg release notes. >> Would love to see release notes in the GitHub releases for them. >> >> >> >> >> >> IMO, The most important measurement of success for choosing either of >> these options is about making the contributor experience as smooth as >> possible. >> >> Monorepo has the advantage of one place to look, all changes across >> core/clients can be modeled in a single PR, and sharing resources. At >> first, I considered managing the build to only be a problem for Iceberg >> committers managing the build, but ultimately this is setting us up for a >> longer build and running unnecessary infrastructure for unrelated tasks. >> There is definitely ways that we can verify what parts of the code have >> been changed and which code should be run, but it will not always be clear >> or simple to know if we tested too much or not enough. >> >> For that, I am also in the multi-repo camp (for clients). I think despite >> having to manage different repos for each client, I generally consider the >> work of each client to be independent of the work happening in the main >> repo. In this view, it's possibly better that the work be independent and >> seen on its own. The biggest win IMO is the intentional separation of >> testing and deployment infrastructure. This will make for a better >> experience when folks are contributing, testing, and looking for release >> notes. >> >> But I also really don't care as long as we do the same things across >> clients. ;) >> >> Bits >> >> >> >> >> >> On Thu, Aug 10, 2023 at 2:38 AM Renjie Liu <liurenjie2...@gmail.com> >> wrote: >> >> Hi, all: >> >> >> >> In yesterday’s community sync we talked about the location of different >> language clients, and I think we all agree that there should be consistent >> behavior for these clients, but the decision has not been made yet. I want >> to continue the discussion here on the pros and cons of different sides: >> mono repo(all in one big repo) or multi small repos( one for each language >> client) >> >> >> >> To make things clear, currently we have four language libraries under >> development: >> >> >> >> 1. Java: in main repo(https://github.com/apache/iceberg) >> 2. Python: in main repo (https://github.com/apache/iceberg) >> 3. Go: in main repo (https://github.com/apache/iceberg) >> 4. Rust: in standalone repo (https://github.com/apache/iceberg-rust/) >> >> >> >> Currently I mainly contribute rust client and I can share the thoughts on >> why I voted for standalone repo: >> >> >> >> 1. Easier project setup. Iceberg is a complex project with several >> components, and mainly written in java. As someone not quite familiar with >> this project structure, I feel easier to start a new one rather fitting >> into an existing one. >> 2. Faster ci workflow. In early days of rust client’s development, we >> only need to touch rust related code. If we all live in one mono repo, it >> will trigger unnecessary ci to run for other components. >> >> >> >> I admit that these reasons may not stand for long term maintains, but >> it’s good for fast-paced development in early days. >> >> >> >> After reviewing some discussions on the web, I have a summary about the >> pros and cons of two sides: >> >> >> >> Mono Repo >> >> >> >> Pros >> >> - *Visibility and transparency*. It would be easier to follow >> progresses of all clients, and prs can have more reviews and attractions. >> - *Easier sharing of resources*. It would be easier to share >> resources for integration tests. >> >> Cons >> >> - *Increases complexity of project structure*. The project structure >> would be more complex when coupling different languages and toolchain >> setup. >> - *Longer build/ci time. *Unnecessary ci checks maybe triggered for >> small prs in different languages. >> >> >> >> Multi Repo >> >> >> >> Pros >> >> - *Simplifies project structure*. Different language may have >> toolchains and project setup, one repo for one language makes project >> structure easier to understand and follow. >> - *Independent versioning and releases*. Different language may have >> different versioning and releases process. It’s also possible in monorepo, >> but I guess it would be easier in standalone multi repo. >> - *Improved build/ci time*. No unnecessary ci checks will be >> triggered. >> >> Cons >> >> - *Difficult to track the overall progress. *Multi repos makes it >> harder to track what’s happening in different teams. >> - *Difficult to share common resources.* It maybe more difficult to >> share resources and do integration tests cross different languages. >> >> >> >> >> >> Welcome to share your ideas and thoughts in this discussion! >> >> >> >> References >> >> >> >> 1. >> >> https://www.coforge.com/blog/mono-repo-vs.-multi-repo-in-git-unravelling-the-key-differences >> >> >> >> >> -- >> >> Ryan Blue >> >> Tabular >> > -- Renjie Liu Software Engineer, MVAD