Hey everyone,

It has been a week since PyIceberg migrated to its own repository. Should
we move forward by removing the Python codebase from the main repository?
Ajantha already raised a pull-request
<https://github.com/apache/iceberg/pull/8695> to do this (thank you for
that 🙌).

Kind regards,
Fokko

Op ma 2 okt 2023 om 16:16 schreef Fokko Driesprong <fo...@apache.org>:

> Hey everyone,
>
> Update from my side. I've moved all the issues
> <https://github.com/apache/iceberg-python/issues> and my PRs
> <https://github.com/apache/iceberg-python/pulls>. Not all issues needed
> to be migrated since a lot of them were already fixed. I've closed the
> remaining PRs that were still open, those are either abandoned, failed on
> CI, or had changes pending. Of course, with the kind request to re-open
> them to the iceberg-python repository.
>
> Ajantha already created a PR <https://github.com/apache/iceberg/pull/8695>
> (thanks for that!) to remove Python from the iceberg repo.
>
> Kind regards, Fokko
>
>
> Op za 30 sep 2023 om 21:06 schreef Fokko Driesprong <fo...@apache.org>:
>
>> Hey everyone,
>>
>> Pucheng: I wonder how do we deal with all the issues filed for python
>>> module but still open in iceberg repo?
>>
>>
>> That's a good point. I think we should migrate them. I checked and it is
>> only 3 pages
>> <https://github.com/apache/iceberg/issues?q=is%3Aissue+is%3Aopen+python>.
>> Likely a few more if we query on other keywords. I think migrating them by
>> hand is feasible. It also gives us a chance to clean them up (all the
>> issues on the last page I linked above are not relevant anymore, and can be
>> closed).
>>
>> Brian: The one thing we will lose is pull requests, but I assume there
>>> are very few.
>>
>>
>> I've checked those as well, and as Brian already mentioned, there are just
>> a few
>> <https://github.com/apache/iceberg/pulls?q=is%3Apr+is%3Aopen+label%3Apython>.
>> There is never a perfect moment since there are always PRs open that will
>> break, but just after the release I think is the best worst moment :) The
>> PRs that are open are trivial to move to the new repo as well.
>>
>> Hussain: I checked the discussion thread, and one of the motivations for
>>> this separation was to avoid triggering unrelated CI jobs after each
>>> change. However, I wonder if it isn't (and will not be) necessary to check
>>> the compatibility between the main repository and the client after each
>>> change. Otherwise, we will need to trigger the CI across the different
>>> repositories using the GHA API, not necessarily to block the PR, but just
>>> to give quick feedback and notification that something needs to be changed
>>> on the client side.
>>
>>
>> Checking between dev versions is not something we do today, and PyIceberg
>> lives isolated in the main repository. We might want to do some integration
>> tests at some point, but I'm not sure if we should start testing dev
>> versions against each other. The main issue with triggering the CI is to
>> not exponentially explode the ignore list
>> <https://github.com/apache/iceberg/blob/master/.github/workflows/flink-ci.yml#L20-L51>
>> of a Github action. An example here
>> <https://github.com/apache/iceberg/pull/8546#issuecomment-1712958280> is
>> where the Python GA file was not properly excluded.
>>
>> I would much rather rely on some reference tests that Jean-Baptiste
>> mentioned at the Java Iceberg 1.4.0 release, and that we're also working on
>> at Tabular (disclaimer: I'm working for Tabular). Python i inspired by
>> Java, and we've recently uncovered some issues
>> <https://github.com/apache/iceberg/pull/8673> (thanks Jan Finis!) with
>> respect to adhering to the spec, so I think a strict approach to validate
>> the implementations would be preferred.
>>
>> That said, in PyIceberg we use Spark (which uses the Java library) to run
>> integration tests. This is based on the released versions which works very
>> well. Not sure if we should create matrices between
>> Python/Go/Rust/Iceberg/Athena/Snowflake/... (you're seeing where this is
>> going) :) But these are just my thoughts today and might change in the
>> future.
>>
>> Thanks everyone, I'll go ahead and merge the PR that includes the history.
>>
>> Cheers, Fokko
>>
>> Ps. The repo might look a bit funky, but that's because I've created the
>> pr-branch before the main branch. I didn't know that the branch that was
>> created first, would be promoted to the default branch. I'm working with 
>> Apache
>> Infra <https://issues.apache.org/jira/browse/INFRA-25029> to get it
>> fixed.
>>
>> Op za 30 sep 2023 om 20:29 schreef Daniel Weeks <dwe...@apache.org>:
>>
>>> +1 to relocate with history.
>>>
>>> On Sat, Sep 30, 2023, 10:24 AM Brian Olsen <bitsondata...@gmail.com>
>>> wrote:
>>>
>>>> This shouldn’t be too hard and can likely be a nightly build that
>>>> occurs with each client repository.
>>>>
>>>> We’re already planning on doing the documentation using git submodule
>>>> to pull all the documentation under a single build in the central repo. We
>>>> can likely go the other direction to run client-core integration tests. I
>>>> prefer these go on the client end to avoid too much ci running on the core
>>>> repo. We have to also consider whatever we choose to do with Python client
>>>> we will also apply to go, Rust, and any future client. Happy to hear
>>>> alternatives though!
>>>>
>>>> WDYT Fokko?
>>>>
>>>>
>>>>
>>>> On Sat, Sep 30, 2023 at 7:12 AM Hussein Awala <huss...@awala.fr> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> I checked the discussion thread, and one of the motivations for this
>>>>> separation was to avoid triggering unrelated CI jobs after each change.
>>>>> However, I wonder if it isn't (and will not be) necessary to check the
>>>>> compatibility between the main repository and the client after each 
>>>>> change.
>>>>> Otherwise, we will need to trigger the CI across the different 
>>>>> repositories
>>>>> using the GHA API, not necessarily to block the PR, but just to give quick
>>>>> feedback and notification that something needs to be changed on the client
>>>>> side.
>>>>>
>>>>> On Fri, Sep 29, 2023 at 9:39 PM Brian Olsen <bitsondata...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Great work Fokko!
>>>>>>
>>>>>> Pucheng,
>>>>>>
>>>>>> We still want to maintain all of the issues in the Python repository.
>>>>>> The one thing we will lose is pull requests, but I assume there are very
>>>>>> few.
>>>>>>
>>>>>> On Fri, Sep 29, 2023 at 10:34 AM Pucheng Yang
>>>>>> <py...@pinterest.com.invalid> wrote:
>>>>>>
>>>>>>> Thanks for doing this. I wonder how do we deal with all the issues
>>>>>>> filed for python module but still open in iceberg repo?
>>>>>>>
>>>>>>> On Fri, Sep 29, 2023 at 7:55 AM Eduard Tudenhoefner <
>>>>>>> edu...@tabular.io> wrote:
>>>>>>>
>>>>>>>> +1 on moving to a separate repo and maintaining git history
>>>>>>>>
>>>>>>>> On Fri, Sep 29, 2023 at 3:30 PM Jean-Baptiste Onofré <
>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>
>>>>>>>>> Awesome, it looks even better ;)
>>>>>>>>>
>>>>>>>>> Thanks !
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On Fri, Sep 29, 2023 at 2:31 PM Fokko Driesprong <fo...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > Hey Ajantha,
>>>>>>>>> >
>>>>>>>>> > That's a great suggestion. I've followed the steps and created a
>>>>>>>>> new PR here: https://github.com/apache/iceberg-python/pull/3
>>>>>>>>> >
>>>>>>>>> > The subdirectory-filter command moves a subdirectory to the root
>>>>>>>>> directory. This way I still had to add some files afterward 
>>>>>>>>> (.github/*,
>>>>>>>>> .gitignore, etc.), these are in a separate commit. Please take a look.
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> >
>>>>>>>>> > Fokko
>>>>>>>>> >
>>>>>>>>> > Op vr 29 sep 2023 om 13:39 schreef Ajantha Bhat <
>>>>>>>>> ajanthab...@gmail.com>:
>>>>>>>>> >>
>>>>>>>>> >> I think we are gonna lose the history of commits if we merge
>>>>>>>>> the above PR.
>>>>>>>>> >>
>>>>>>>>> >> There are ways to move the subfolder into a new repo by
>>>>>>>>> retaining commit history.
>>>>>>>>> >> For example:
>>>>>>>>> >> -
>>>>>>>>> https://medium.com/@ayushya/move-directory-from-one-repository-to-another-preserving-git-history-d210fa049d4b
>>>>>>>>> >> - https://gist.github.com/trongthanh/2779392
>>>>>>>>> >>
>>>>>>>>> >> Please give it a try.
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >> Ajantha
>>>>>>>>> >>
>>>>>>>>> >> On Fri, Sep 29, 2023 at 4:55 PM Fokko Driesprong <
>>>>>>>>> fo...@apache.org> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hey everyone 👋
>>>>>>>>> >>>
>>>>>>>>> >>> A while ago we discussed that Rust and Go are going into a
>>>>>>>>> separate repository:
>>>>>>>>> https://lists.apache.org/thread/4s02lmwf1kyrxxdpj3q9w2fqnxq2llbn
>>>>>>>>> >>>
>>>>>>>>> >>> Since we just did the PyIcerg 0.5.0 release, I think it is a
>>>>>>>>> good moment to migrate PyIceberg to iceberg-python as well:
>>>>>>>>> https://github.com/apache/iceberg-python/pull/2 I went over the
>>>>>>>>> PRs that are ready to merge and got them in. If there is anything 
>>>>>>>>> missing,
>>>>>>>>> please let me know.
>>>>>>>>> >>>
>>>>>>>>> >>> I would suggest merging the PR and leaving the source code in
>>>>>>>>> the main repository for another week or so to make sure that we 
>>>>>>>>> didn't miss
>>>>>>>>> anything.
>>>>>>>>> >>>
>>>>>>>>> >>> Since PyIceberg now also hosts the docs on the Github pages of
>>>>>>>>> the Iceberg repository, moving PyIceberg will also free up the Github 
>>>>>>>>> pages
>>>>>>>>> for the migration of the docs back into the main repository.
>>>>>>>>> >>>
>>>>>>>>> >>> Let me know if there are any concerns.
>>>>>>>>> >>>
>>>>>>>>> >>> Kind regards,
>>>>>>>>> >>> Fokko Driesprong
>>>>>>>>>
>>>>>>>>

Reply via email to