Hey everyone,

Update from my side. I've moved all the issues
<https://github.com/apache/iceberg-python/issues> and my PRs
<https://github.com/apache/iceberg-python/pulls>. Not all issues needed to
be migrated since a lot of them were already fixed. I've closed the
remaining PRs that were still open, those are either abandoned, failed on
CI, or had changes pending. Of course, with the kind request to re-open
them to the iceberg-python repository.

Ajantha already created a PR <https://github.com/apache/iceberg/pull/8695>
(thanks for that!) to remove Python from the iceberg repo.

Kind regards, Fokko


Op za 30 sep 2023 om 21:06 schreef Fokko Driesprong <fo...@apache.org>:

> Hey everyone,
>
> Pucheng: I wonder how do we deal with all the issues filed for python
>> module but still open in iceberg repo?
>
>
> That's a good point. I think we should migrate them. I checked and it is
> only 3 pages
> <https://github.com/apache/iceberg/issues?q=is%3Aissue+is%3Aopen+python>.
> Likely a few more if we query on other keywords. I think migrating them by
> hand is feasible. It also gives us a chance to clean them up (all the
> issues on the last page I linked above are not relevant anymore, and can be
> closed).
>
> Brian: The one thing we will lose is pull requests, but I assume there are
>> very few.
>
>
> I've checked those as well, and as Brian already mentioned, there are just
> a few
> <https://github.com/apache/iceberg/pulls?q=is%3Apr+is%3Aopen+label%3Apython>.
> There is never a perfect moment since there are always PRs open that will
> break, but just after the release I think is the best worst moment :) The
> PRs that are open are trivial to move to the new repo as well.
>
> Hussain: I checked the discussion thread, and one of the motivations for
>> this separation was to avoid triggering unrelated CI jobs after each
>> change. However, I wonder if it isn't (and will not be) necessary to check
>> the compatibility between the main repository and the client after each
>> change. Otherwise, we will need to trigger the CI across the different
>> repositories using the GHA API, not necessarily to block the PR, but just
>> to give quick feedback and notification that something needs to be changed
>> on the client side.
>
>
> Checking between dev versions is not something we do today, and PyIceberg
> lives isolated in the main repository. We might want to do some integration
> tests at some point, but I'm not sure if we should start testing dev
> versions against each other. The main issue with triggering the CI is to
> not exponentially explode the ignore list
> <https://github.com/apache/iceberg/blob/master/.github/workflows/flink-ci.yml#L20-L51>
> of a Github action. An example here
> <https://github.com/apache/iceberg/pull/8546#issuecomment-1712958280> is
> where the Python GA file was not properly excluded.
>
> I would much rather rely on some reference tests that Jean-Baptiste
> mentioned at the Java Iceberg 1.4.0 release, and that we're also working on
> at Tabular (disclaimer: I'm working for Tabular). Python i inspired by
> Java, and we've recently uncovered some issues
> <https://github.com/apache/iceberg/pull/8673> (thanks Jan Finis!) with
> respect to adhering to the spec, so I think a strict approach to validate
> the implementations would be preferred.
>
> That said, in PyIceberg we use Spark (which uses the Java library) to run
> integration tests. This is based on the released versions which works very
> well. Not sure if we should create matrices between
> Python/Go/Rust/Iceberg/Athena/Snowflake/... (you're seeing where this is
> going) :) But these are just my thoughts today and might change in the
> future.
>
> Thanks everyone, I'll go ahead and merge the PR that includes the history.
>
> Cheers, Fokko
>
> Ps. The repo might look a bit funky, but that's because I've created the
> pr-branch before the main branch. I didn't know that the branch that was
> created first, would be promoted to the default branch. I'm working with 
> Apache
> Infra <https://issues.apache.org/jira/browse/INFRA-25029> to get it fixed.
>
> Op za 30 sep 2023 om 20:29 schreef Daniel Weeks <dwe...@apache.org>:
>
>> +1 to relocate with history.
>>
>> On Sat, Sep 30, 2023, 10:24 AM Brian Olsen <bitsondata...@gmail.com>
>> wrote:
>>
>>> This shouldn’t be too hard and can likely be a nightly build that occurs
>>> with each client repository.
>>>
>>> We’re already planning on doing the documentation using git submodule to
>>> pull all the documentation under a single build in the central repo. We can
>>> likely go the other direction to run client-core integration tests. I
>>> prefer these go on the client end to avoid too much ci running on the core
>>> repo. We have to also consider whatever we choose to do with Python client
>>> we will also apply to go, Rust, and any future client. Happy to hear
>>> alternatives though!
>>>
>>> WDYT Fokko?
>>>
>>>
>>>
>>> On Sat, Sep 30, 2023 at 7:12 AM Hussein Awala <huss...@awala.fr> wrote:
>>>
>>>> +1
>>>>
>>>> I checked the discussion thread, and one of the motivations for this
>>>> separation was to avoid triggering unrelated CI jobs after each change.
>>>> However, I wonder if it isn't (and will not be) necessary to check the
>>>> compatibility between the main repository and the client after each change.
>>>> Otherwise, we will need to trigger the CI across the different repositories
>>>> using the GHA API, not necessarily to block the PR, but just to give quick
>>>> feedback and notification that something needs to be changed on the client
>>>> side.
>>>>
>>>> On Fri, Sep 29, 2023 at 9:39 PM Brian Olsen <bitsondata...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Great work Fokko!
>>>>>
>>>>> Pucheng,
>>>>>
>>>>> We still want to maintain all of the issues in the Python repository.
>>>>> The one thing we will lose is pull requests, but I assume there are very
>>>>> few.
>>>>>
>>>>> On Fri, Sep 29, 2023 at 10:34 AM Pucheng Yang
>>>>> <py...@pinterest.com.invalid> wrote:
>>>>>
>>>>>> Thanks for doing this. I wonder how do we deal with all the issues
>>>>>> filed for python module but still open in iceberg repo?
>>>>>>
>>>>>> On Fri, Sep 29, 2023 at 7:55 AM Eduard Tudenhoefner <
>>>>>> edu...@tabular.io> wrote:
>>>>>>
>>>>>>> +1 on moving to a separate repo and maintaining git history
>>>>>>>
>>>>>>> On Fri, Sep 29, 2023 at 3:30 PM Jean-Baptiste Onofré <
>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>
>>>>>>>> Awesome, it looks even better ;)
>>>>>>>>
>>>>>>>> Thanks !
>>>>>>>> Regards
>>>>>>>> JB
>>>>>>>>
>>>>>>>> On Fri, Sep 29, 2023 at 2:31 PM Fokko Driesprong <fo...@apache.org>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Hey Ajantha,
>>>>>>>> >
>>>>>>>> > That's a great suggestion. I've followed the steps and created a
>>>>>>>> new PR here: https://github.com/apache/iceberg-python/pull/3
>>>>>>>> >
>>>>>>>> > The subdirectory-filter command moves a subdirectory to the root
>>>>>>>> directory. This way I still had to add some files afterward (.github/*,
>>>>>>>> .gitignore, etc.), these are in a separate commit. Please take a look.
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> >
>>>>>>>> > Fokko
>>>>>>>> >
>>>>>>>> > Op vr 29 sep 2023 om 13:39 schreef Ajantha Bhat <
>>>>>>>> ajanthab...@gmail.com>:
>>>>>>>> >>
>>>>>>>> >> I think we are gonna lose the history of commits if we merge the
>>>>>>>> above PR.
>>>>>>>> >>
>>>>>>>> >> There are ways to move the subfolder into a new repo by
>>>>>>>> retaining commit history.
>>>>>>>> >> For example:
>>>>>>>> >> -
>>>>>>>> https://medium.com/@ayushya/move-directory-from-one-repository-to-another-preserving-git-history-d210fa049d4b
>>>>>>>> >> - https://gist.github.com/trongthanh/2779392
>>>>>>>> >>
>>>>>>>> >> Please give it a try.
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >> Ajantha
>>>>>>>> >>
>>>>>>>> >> On Fri, Sep 29, 2023 at 4:55 PM Fokko Driesprong <
>>>>>>>> fo...@apache.org> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Hey everyone 👋
>>>>>>>> >>>
>>>>>>>> >>> A while ago we discussed that Rust and Go are going into a
>>>>>>>> separate repository:
>>>>>>>> https://lists.apache.org/thread/4s02lmwf1kyrxxdpj3q9w2fqnxq2llbn
>>>>>>>> >>>
>>>>>>>> >>> Since we just did the PyIcerg 0.5.0 release, I think it is a
>>>>>>>> good moment to migrate PyIceberg to iceberg-python as well:
>>>>>>>> https://github.com/apache/iceberg-python/pull/2 I went over the
>>>>>>>> PRs that are ready to merge and got them in. If there is anything 
>>>>>>>> missing,
>>>>>>>> please let me know.
>>>>>>>> >>>
>>>>>>>> >>> I would suggest merging the PR and leaving the source code in
>>>>>>>> the main repository for another week or so to make sure that we didn't 
>>>>>>>> miss
>>>>>>>> anything.
>>>>>>>> >>>
>>>>>>>> >>> Since PyIceberg now also hosts the docs on the Github pages of
>>>>>>>> the Iceberg repository, moving PyIceberg will also free up the Github 
>>>>>>>> pages
>>>>>>>> for the migration of the docs back into the main repository.
>>>>>>>> >>>
>>>>>>>> >>> Let me know if there are any concerns.
>>>>>>>> >>>
>>>>>>>> >>> Kind regards,
>>>>>>>> >>> Fokko Driesprong
>>>>>>>>
>>>>>>>

Reply via email to