I should have mentioned that Github does automatic redirection when you do a rename of a repository. But you're all right, the impact is possibly bigger than we can envision and it is probably not worth it.
I took the liberty of creating iceberg-python <https://github.com/apache/icebergp-python> and iceberg-go <https://github.com/apache/iceberg-go>. For Python, I'd love to do a release this month, I think right after that (hopefully most PRs are in), it would be a good moment to split out the Python part from the Java repository. Kind regards, Fokko Op vr 11 aug 2023 om 04:08 schreef Renjie Liu <liurenjie2...@gmail.com>: > Thanks everyone for nice discussion. > > > > +1 for multi repo while keeping core spec and java implementation in > apache/iceberg. Currently java is still most widely adopted and > sophisticated implementation. We only need to help people to find other > implementation by providing links in web page. > > > > > > *From: *Ryan Blue <b...@tabular.io> > *Date: *Friday, August 11, 2023 at 05:23 > *To: *dev@iceberg.apache.org <dev@iceberg.apache.org> > *Subject: *Re: Discussion about the location of language clients > > I wasn't at the discussion on Wednesday, but it sounds like there is > support for moving to separate repos. Does anyone strongly object? > > > > I also agree with Steven on not renaming to iceberg-java. That's the repo > where we keep the spec and Java is the reference implementation. Plus we > don't want to break a ton of links. > > > > Ryan > > > > On Thu, Aug 10, 2023 at 1:05 PM Steven Wu <stevenz...@gmail.com> wrote: > > I am also on the side of separate repos for different languages. > otherwise, the main repo can grow too big. iceberg.apache.org website can > provide proper links to repos for different languages. > > > > I would be -1 on renaming apache/iceberg to apache/iceberg-java, as it can > break external links to the main/original github repo. the tradeoff may not > be worth it. > > > > On Thu, Aug 10, 2023 at 8:16 AM Fokko Driesprong <fo...@apache.org> wrote: > > Hi everyone, > > > > Today I took a stab at the generation of wheels in Python (here's the PR > <https://github.com/apache/iceberg/pull/8287> if anyone is interested), > and when testing this it would also kick off many unrelated CI jobs. This > is just for two languages, and I'm not convinced that it will scale to many > languages. Also, having a different release cycle for each of the languages > will clutter up the tags, releases, etc. I'm convinced that > separate repositories are more scalable in the future, we just have to make > sure that they can be found easily (rename apache/iceberg to > apache/iceberg-java?). > > > > Cheers, Fokko > > > > > > > > Op do 10 aug 2023 om 14:18 schreef Jan Kaul <jank...@mailbox.org.invalid>: > > Hi all, > > first off, thanks Brian for starting the conversation and thanks Renjie > for the write up. > > I'm also in the camp multi-repo because of the already mentioned benefits. > > One point I would like to add is that the potential drawback of having > less visibility with multi-repos can be mitigated to some extent. I think > that if the different repos are clearly and visibly presented on the > iceberg website people should be able to find the desired implementation. > > Best wishes, > > Jan > > On 10.08.23 13:43, Brian Olsen wrote: > > Renjie, you're amazing. > > I think you summarized this better than I could, so thank you for that. > > I'd like to pull in a user's feedback on Slack > > FWIW, I’m personally a fan of separate repos for the client libraries. > It keeps things more a bit more isolated (in a good way) and explorable > (rather than overwhelming). GitHub search is a bit easier to use. And I > think it generally lowers the bar to contributing. Independent versioning, > and GitHub releases are a big win too, I think. > > > > Right now, I don’t actually know where to find PyIceberg release notes. > Would love to see release notes in the GitHub releases for them. > > > > > > IMO, The most important measurement of success for choosing either of > these options is about making the contributor experience as smooth as > possible. > > Monorepo has the advantage of one place to look, all changes across > core/clients can be modeled in a single PR, and sharing resources. At > first, I considered managing the build to only be a problem for Iceberg > committers managing the build, but ultimately this is setting us up for a > longer build and running unnecessary infrastructure for unrelated tasks. > There is definitely ways that we can verify what parts of the code have > been changed and which code should be run, but it will not always be clear > or simple to know if we tested too much or not enough. > > For that, I am also in the multi-repo camp (for clients). I think despite > having to manage different repos for each client, I generally consider the > work of each client to be independent of the work happening in the main > repo. In this view, it's possibly better that the work be independent and > seen on its own. The biggest win IMO is the intentional separation of > testing and deployment infrastructure. This will make for a better > experience when folks are contributing, testing, and looking for release > notes. > > But I also really don't care as long as we do the same things across > clients. ;) > > Bits > > > > > > On Thu, Aug 10, 2023 at 2:38 AM Renjie Liu <liurenjie2...@gmail.com> > wrote: > > Hi, all: > > > > In yesterday’s community sync we talked about the location of different > language clients, and I think we all agree that there should be consistent > behavior for these clients, but the decision has not been made yet. I want > to continue the discussion here on the pros and cons of different sides: > mono repo(all in one big repo) or multi small repos( one for each language > client) > > > > To make things clear, currently we have four language libraries under > development: > > > > 1. Java: in main repo(https://github.com/apache/iceberg) > 2. Python: in main repo (https://github.com/apache/iceberg) > 3. Go: in main repo (https://github.com/apache/iceberg) > 4. Rust: in standalone repo (https://github.com/apache/iceberg-rust/) > > > > Currently I mainly contribute rust client and I can share the thoughts on > why I voted for standalone repo: > > > > 1. Easier project setup. Iceberg is a complex project with several > components, and mainly written in java. As someone not quite familiar with > this project structure, I feel easier to start a new one rather fitting > into an existing one. > 2. Faster ci workflow. In early days of rust client’s development, we > only need to touch rust related code. If we all live in one mono repo, it > will trigger unnecessary ci to run for other components. > > > > I admit that these reasons may not stand for long term maintains, but it’s > good for fast-paced development in early days. > > > > After reviewing some discussions on the web, I have a summary about the > pros and cons of two sides: > > > > Mono Repo > > > > Pros > > - *Visibility and transparency*. It would be easier to follow > progresses of all clients, and prs can have more reviews and attractions. > - *Easier sharing of resources*. It would be easier to share resources > for integration tests. > > Cons > > - *Increases complexity of project structure*. The project structure > would be more complex when coupling different languages and toolchain > setup. > - *Longer build/ci time. *Unnecessary ci checks maybe triggered for > small prs in different languages. > > > > Multi Repo > > > > Pros > > - *Simplifies project structure*. Different language may have > toolchains and project setup, one repo for one language makes project > structure easier to understand and follow. > - *Independent versioning and releases*. Different language may have > different versioning and releases process. It’s also possible in monorepo, > but I guess it would be easier in standalone multi repo. > - *Improved build/ci time*. No unnecessary ci checks will be triggered. > > Cons > > - *Difficult to track the overall progress. *Multi repos makes it > harder to track what’s happening in different teams. > - *Difficult to share common resources.* It maybe more difficult to > share resources and do integration tests cross different languages. > > > > > > Welcome to share your ideas and thoughts in this discussion! > > > > References > > > > 1. > > https://www.coforge.com/blog/mono-repo-vs.-multi-repo-in-git-unravelling-the-key-differences > > > > > -- > > Ryan Blue > > Tabular >