Thanks for your effort.

On Fri, Aug 11, 2023 at 14:17 Fokko Driesprong <fo...@apache.org> wrote:

> I should have mentioned that Github does automatic redirection when you do
> a rename of a repository. But you're all right, the impact is possibly
> bigger than we can envision and it is probably not worth it.
>
> I took the liberty of creating iceberg-python
> <https://github.com/apache/icebergp-python> and iceberg-go
> <https://github.com/apache/iceberg-go>. For Python, I'd love to do a
> release this month, I think right after that (hopefully most PRs are in),
> it would be a good moment to split out the Python part from the Java
> repository.
>
> Kind regards,
> Fokko
>
>
> Op vr 11 aug 2023 om 04:08 schreef Renjie Liu <liurenjie2...@gmail.com>:
>
>> Thanks everyone for nice discussion.
>>
>>
>>
>> +1 for multi repo while keeping core spec and java implementation in
>> apache/iceberg. Currently java is still most widely adopted and
>> sophisticated implementation. We only need to help people to find other
>> implementation by providing links in web page.
>>
>>
>>
>>
>>
>> *From: *Ryan Blue <b...@tabular.io>
>> *Date: *Friday, August 11, 2023 at 05:23
>> *To: *dev@iceberg.apache.org <dev@iceberg.apache.org>
>> *Subject: *Re: Discussion about the location of language clients
>>
>> I wasn't at the discussion on Wednesday, but it sounds like there is
>> support for moving to separate repos. Does anyone strongly object?
>>
>>
>>
>> I also agree with Steven on not renaming to iceberg-java. That's the repo
>> where we keep the spec and Java is the reference implementation. Plus we
>> don't want to break a ton of links.
>>
>>
>>
>> Ryan
>>
>>
>>
>> On Thu, Aug 10, 2023 at 1:05 PM Steven Wu <stevenz...@gmail.com> wrote:
>>
>> I am also on the side of separate repos for different languages.
>> otherwise, the main repo can grow too big. iceberg.apache.org website
>> can provide proper links to repos for different languages.
>>
>>
>>
>> I would be -1 on renaming apache/iceberg to apache/iceberg-java, as it
>> can break external links to the main/original github repo. the tradeoff may
>> not be worth it.
>>
>>
>>
>> On Thu, Aug 10, 2023 at 8:16 AM Fokko Driesprong <fo...@apache.org>
>> wrote:
>>
>> Hi everyone,
>>
>>
>>
>> Today I took a stab at the generation of wheels in Python (here's the PR
>> <https://github.com/apache/iceberg/pull/8287> if anyone is interested),
>> and when testing this it would also kick off many unrelated CI jobs. This
>> is just for two languages, and I'm not convinced that it will scale to many
>> languages. Also, having a different release cycle for each of the languages
>> will clutter up the tags, releases, etc. I'm convinced that
>> separate repositories are more scalable in the future, we just have to make
>> sure that they can be found easily (rename apache/iceberg to
>> apache/iceberg-java?).
>>
>>
>>
>> Cheers, Fokko
>>
>>
>>
>>
>>
>>
>>
>> Op do 10 aug 2023 om 14:18 schreef Jan Kaul <jank...@mailbox.org.invalid
>> >:
>>
>> Hi all,
>>
>> first off, thanks Brian for starting the conversation and thanks Renjie
>> for the write up.
>>
>> I'm also in the camp multi-repo because of the already mentioned
>> benefits.
>>
>> One point I would like to add is that the potential drawback of having
>> less visibility with multi-repos can be mitigated to some extent. I think
>> that if the different repos are clearly and visibly presented on the
>> iceberg website people should be able to find the desired implementation.
>>
>> Best wishes,
>>
>> Jan
>>
>> On 10.08.23 13:43, Brian Olsen wrote:
>>
>> Renjie, you're amazing.
>>
>> I think you summarized this better than I could, so thank you for that.
>>
>> I'd like to pull in a user's feedback on Slack
>>
>> FWIW, I’m personally a fan of separate repos for the client libraries.
>> It keeps things more a bit more isolated (in a good way) and explorable
>> (rather than overwhelming). GitHub search is a bit easier to use. And I
>> think it generally lowers the bar to contributing. Independent versioning,
>> and GitHub releases are a big win too, I think.
>>
>>
>>
>> Right now, I don’t actually know where to find PyIceberg release notes.
>> Would love to see release notes in the GitHub releases for them.
>>
>>
>>
>>
>>
>> IMO, The most important measurement of success for choosing either of
>> these options is about making the contributor experience as smooth as
>> possible.
>>
>> Monorepo has the advantage of one place to look, all changes across
>> core/clients can be modeled in a single PR, and sharing resources. At
>> first, I considered managing the build to only be a problem for Iceberg
>> committers managing the build, but ultimately this is setting us up for a
>> longer build and running unnecessary infrastructure for unrelated tasks.
>> There is definitely ways that we can verify what parts of the code have
>> been changed and which code should be run, but it will not always be clear
>> or simple to know if we tested too much or not enough.
>>
>> For that, I am also in the multi-repo camp (for clients). I think despite
>> having to manage different repos for each client, I generally consider the
>> work of each client to be independent of the work happening in the main
>> repo. In this view, it's possibly better that the work be independent and
>> seen on its own. The biggest win IMO is the intentional separation of
>> testing and deployment infrastructure. This will make for a better
>> experience when folks are contributing, testing, and looking for release
>> notes.
>>
>> But I also really don't care as long as we do the same things across
>> clients. ;)
>>
>> Bits
>>
>>
>>
>>
>>
>> On Thu, Aug 10, 2023 at 2:38 AM Renjie Liu <liurenjie2...@gmail.com>
>> wrote:
>>
>> Hi, all:
>>
>>
>>
>> In yesterday’s community sync we talked about the location of different
>> language clients, and I think we all agree that there should be consistent
>> behavior for these clients, but the decision has not been made yet. I want
>> to continue the discussion here on the pros and cons of different sides:
>> mono repo(all in one big repo) or multi small repos( one for each language
>> client)
>>
>>
>>
>> To make things clear, currently we have four language libraries under
>> development:
>>
>>
>>
>>    1. Java: in main repo(https://github.com/apache/iceberg)
>>    2. Python: in main repo (https://github.com/apache/iceberg)
>>    3. Go: in main repo (https://github.com/apache/iceberg)
>>    4. Rust: in standalone repo (https://github.com/apache/iceberg-rust/)
>>
>>
>>
>> Currently I mainly contribute rust client and I can share the thoughts on
>> why I voted for standalone repo:
>>
>>
>>
>>    1. Easier project setup. Iceberg is a complex project with several
>>    components, and mainly written in java. As someone not quite familiar with
>>    this project structure, I feel easier to start a new one rather fitting
>>    into an existing one.
>>    2. Faster ci workflow. In early days of rust client’s development, we
>>    only need to touch rust related code. If we all live in one mono repo, it
>>    will trigger unnecessary ci to run for other components.
>>
>>
>>
>> I admit that these reasons may not stand for long term maintains, but
>> it’s good for fast-paced development in early days.
>>
>>
>>
>> After reviewing some discussions on the web, I have a summary about the
>> pros and cons of two sides:
>>
>>
>>
>> Mono Repo
>>
>>
>>
>> Pros
>>
>>    - *Visibility and transparency*. It would be easier to follow
>>    progresses of all clients, and prs can have more reviews and attractions.
>>    - *Easier sharing of resources*. It would be easier to share
>>    resources for integration tests.
>>
>> Cons
>>
>>    - *Increases complexity of project structure*. The project structure
>>    would be more complex when coupling different languages and toolchain 
>> setup.
>>    - *Longer build/ci time.  *Unnecessary ci checks maybe triggered for
>>    small prs in different languages.
>>
>>
>>
>> Multi Repo
>>
>>
>>
>> Pros
>>
>>    - *Simplifies project structure*. Different language may have
>>    toolchains and project setup, one repo for one language makes project
>>    structure easier to understand and follow.
>>    - *Independent versioning and releases*. Different language may have
>>    different versioning and releases process. It’s also possible in monorepo,
>>    but I guess it would be easier in standalone multi repo.
>>    - *Improved build/ci time*. No unnecessary ci checks will be
>>    triggered.
>>
>> Cons
>>
>>    - *Difficult to track the overall progress. *Multi repos makes it
>>    harder to track what’s happening in different teams.
>>    - *Difficult to share common resources.* It maybe more difficult to
>>    share resources and do integration tests cross different languages.
>>
>>
>>
>>
>>
>> Welcome to share your ideas and thoughts in this discussion!
>>
>>
>>
>> References
>>
>>
>>
>>    1.
>>    
>> https://www.coforge.com/blog/mono-repo-vs.-multi-repo-in-git-unravelling-the-key-differences
>>
>>
>>
>>
>> --
>>
>> Ryan Blue
>>
>> Tabular
>>
> --
Renjie Liu
Software Engineer, MVAD

Reply via email to