I wasn't at the discussion on Wednesday, but it sounds like there is
support for moving to separate repos. Does anyone strongly object?

I also agree with Steven on not renaming to iceberg-java. That's the repo
where we keep the spec and Java is the reference implementation. Plus we
don't want to break a ton of links.

Ryan

On Thu, Aug 10, 2023 at 1:05 PM Steven Wu <stevenz...@gmail.com> wrote:

> I am also on the side of separate repos for different languages.
> otherwise, the main repo can grow too big. iceberg.apache.org website can
> provide proper links to repos for different languages.
>
> I would be -1 on renaming apache/iceberg to apache/iceberg-java, as it can
> break external links to the main/original github repo. the tradeoff may not
> be worth it.
>
> On Thu, Aug 10, 2023 at 8:16 AM Fokko Driesprong <fo...@apache.org> wrote:
>
>> Hi everyone,
>>
>> Today I took a stab at the generation of wheels in Python (here's the PR
>> <https://github.com/apache/iceberg/pull/8287> if anyone is interested),
>> and when testing this it would also kick off many unrelated CI jobs. This
>> is just for two languages, and I'm not convinced that it will scale to many
>> languages. Also, having a different release cycle for each of the languages
>> will clutter up the tags, releases, etc. I'm convinced that
>> separate repositories are more scalable in the future, we just have to make
>> sure that they can be found easily (rename apache/iceberg to
>> apache/iceberg-java?).
>>
>> Cheers, Fokko
>>
>>
>>
>> Op do 10 aug 2023 om 14:18 schreef Jan Kaul <jank...@mailbox.org.invalid
>> >:
>>
>>> Hi all,
>>>
>>> first off, thanks Brian for starting the conversation and thanks Renjie
>>> for the write up.
>>>
>>> I'm also in the camp multi-repo because of the already mentioned
>>> benefits.
>>>
>>> One point I would like to add is that the potential drawback of having
>>> less visibility with multi-repos can be mitigated to some extent. I think
>>> that if the different repos are clearly and visibly presented on the
>>> iceberg website people should be able to find the desired implementation.
>>>
>>> Best wishes,
>>>
>>> Jan
>>> On 10.08.23 13:43, Brian Olsen wrote:
>>>
>>> Renjie, you're amazing.
>>>
>>> I think you summarized this better than I could, so thank you for that.
>>>
>>> I'd like to pull in a user's feedback on Slack
>>>
>>> FWIW, I’m personally a fan of separate repos for the client libraries.
>>>> It keeps things more a bit more isolated (in a good way) and explorable
>>>> (rather than overwhelming). GitHub search is a bit easier to use. And I
>>>> think it generally lowers the bar to contributing. Independent versioning,
>>>> and GitHub releases are a big win too, I think.
>>>>
>>>
>>> Right now, I don’t actually know where to find PyIceberg release notes.
>>>> Would love to see release notes in the GitHub releases for them.
>>>
>>>
>>>
>>> IMO, The most important measurement of success for choosing either of
>>> these options is about making the contributor experience as smooth as
>>> possible.
>>>
>>> Monorepo has the advantage of one place to look, all changes across
>>> core/clients can be modeled in a single PR, and sharing resources. At
>>> first, I considered managing the build to only be a problem for Iceberg
>>> committers managing the build, but ultimately this is setting us up for a
>>> longer build and running unnecessary infrastructure for unrelated tasks.
>>> There is definitely ways that we can verify what parts of the code have
>>> been changed and which code should be run, but it will not always be clear
>>> or simple to know if we tested too much or not enough.
>>>
>>> For that, I am also in the multi-repo camp (for clients). I think
>>> despite having to manage different repos for each client, I generally
>>> consider the work of each client to be independent of the work happening in
>>> the main repo. In this view, it's possibly better that the work be
>>> independent and seen on its own. The biggest win IMO is the intentional
>>> separation of testing and deployment infrastructure. This will make for a
>>> better experience when folks are contributing, testing, and looking for
>>> release notes.
>>>
>>> But I also really don't care as long as we do the same things across
>>> clients. ;)
>>>
>>> Bits
>>>
>>>
>>> On Thu, Aug 10, 2023 at 2:38 AM Renjie Liu <liurenjie2...@gmail.com>
>>> wrote:
>>>
>>>> Hi, all:
>>>>
>>>>
>>>>
>>>> In yesterday’s community sync we talked about the location of different
>>>> language clients, and I think we all agree that there should be consistent
>>>> behavior for these clients, but the decision has not been made yet. I want
>>>> to continue the discussion here on the pros and cons of different sides:
>>>> mono repo(all in one big repo) or multi small repos( one for each language
>>>> client)
>>>>
>>>>
>>>>
>>>> To make things clear, currently we have four language libraries under
>>>> development:
>>>>
>>>>
>>>>
>>>>    1. Java: in main repo(https://github.com/apache/iceberg)
>>>>    2. Python: in main repo (https://github.com/apache/iceberg)
>>>>    3. Go: in main repo (https://github.com/apache/iceberg)
>>>>    4. Rust: in standalone repo (https://github.com/apache/iceberg-rust/
>>>>    )
>>>>
>>>>
>>>>
>>>> Currently I mainly contribute rust client and I can share the thoughts
>>>> on why I voted for standalone repo:
>>>>
>>>>
>>>>
>>>>    1. Easier project setup. Iceberg is a complex project with several
>>>>    components, and mainly written in java. As someone not quite familiar 
>>>> with
>>>>    this project structure, I feel easier to start a new one rather fitting
>>>>    into an existing one.
>>>>    2. Faster ci workflow. In early days of rust client’s development,
>>>>    we only need to touch rust related code. If we all live in one mono 
>>>> repo,
>>>>    it will trigger unnecessary ci to run for other components.
>>>>
>>>>
>>>>
>>>> I admit that these reasons may not stand for long term maintains, but
>>>> it’s good for fast-paced development in early days.
>>>>
>>>>
>>>>
>>>> After reviewing some discussions on the web, I have a summary about the
>>>> pros and cons of two sides:
>>>>
>>>>
>>>>
>>>> Mono Repo
>>>>
>>>>
>>>>
>>>> Pros
>>>>
>>>>    - *Visibility and transparency*. It would be easier to follow
>>>>    progresses of all clients, and prs can have more reviews and 
>>>> attractions.
>>>>    - *Easier sharing of resources*. It would be easier to share
>>>>    resources for integration tests.
>>>>
>>>> Cons
>>>>
>>>>    - *Increases complexity of project structure*. The project
>>>>    structure would be more complex when coupling different languages and
>>>>    toolchain setup.
>>>>    - *Longer build/ci time.  *Unnecessary ci checks maybe triggered
>>>>    for small prs in different languages.
>>>>
>>>>
>>>>
>>>> Multi Repo
>>>>
>>>>
>>>>
>>>> Pros
>>>>
>>>>    - *Simplifies project structure*. Different language may have
>>>>    toolchains and project setup, one repo for one language makes project
>>>>    structure easier to understand and follow.
>>>>    - *Independent versioning and releases*. Different language may
>>>>    have different versioning and releases process. It’s also possible in
>>>>    monorepo, but I guess it would be easier in standalone multi repo.
>>>>    - *Improved build/ci time*. No unnecessary ci checks will be
>>>>    triggered.
>>>>
>>>> Cons
>>>>
>>>>    - *Difficult to track the overall progress. *Multi repos makes it
>>>>    harder to track what’s happening in different teams.
>>>>    - *Difficult to share common resources.* It maybe more difficult to
>>>>    share resources and do integration tests cross different languages.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Welcome to share your ideas and thoughts in this discussion!
>>>>
>>>>
>>>>
>>>> References
>>>>
>>>>
>>>>
>>>>    1.
>>>>    
>>>> https://www.coforge.com/blog/mono-repo-vs.-multi-repo-in-git-unravelling-the-key-differences
>>>>
>>>>

-- 
Ryan Blue
Tabular

Reply via email to