Hey everyone, It has been a week since PyIceberg migrated to its own repository. Should we move forward by removing the Python codebase from the main repository? Ajantha already raised a pull-request <https://github.com/apache/iceberg/pull/8695> to do this (thank you for that 🙌).
Kind regards, Fokko Op ma 2 okt 2023 om 16:16 schreef Fokko Driesprong <fo...@apache.org>: > Hey everyone, > > Update from my side. I've moved all the issues > <https://github.com/apache/iceberg-python/issues> and my PRs > <https://github.com/apache/iceberg-python/pulls>. Not all issues needed > to be migrated since a lot of them were already fixed. I've closed the > remaining PRs that were still open, those are either abandoned, failed on > CI, or had changes pending. Of course, with the kind request to re-open > them to the iceberg-python repository. > > Ajantha already created a PR <https://github.com/apache/iceberg/pull/8695> > (thanks for that!) to remove Python from the iceberg repo. > > Kind regards, Fokko > > > Op za 30 sep 2023 om 21:06 schreef Fokko Driesprong <fo...@apache.org>: > >> Hey everyone, >> >> Pucheng: I wonder how do we deal with all the issues filed for python >>> module but still open in iceberg repo? >> >> >> That's a good point. I think we should migrate them. I checked and it is >> only 3 pages >> <https://github.com/apache/iceberg/issues?q=is%3Aissue+is%3Aopen+python>. >> Likely a few more if we query on other keywords. I think migrating them by >> hand is feasible. It also gives us a chance to clean them up (all the >> issues on the last page I linked above are not relevant anymore, and can be >> closed). >> >> Brian: The one thing we will lose is pull requests, but I assume there >>> are very few. >> >> >> I've checked those as well, and as Brian already mentioned, there are just >> a few >> <https://github.com/apache/iceberg/pulls?q=is%3Apr+is%3Aopen+label%3Apython>. >> There is never a perfect moment since there are always PRs open that will >> break, but just after the release I think is the best worst moment :) The >> PRs that are open are trivial to move to the new repo as well. >> >> Hussain: I checked the discussion thread, and one of the motivations for >>> this separation was to avoid triggering unrelated CI jobs after each >>> change. However, I wonder if it isn't (and will not be) necessary to check >>> the compatibility between the main repository and the client after each >>> change. Otherwise, we will need to trigger the CI across the different >>> repositories using the GHA API, not necessarily to block the PR, but just >>> to give quick feedback and notification that something needs to be changed >>> on the client side. >> >> >> Checking between dev versions is not something we do today, and PyIceberg >> lives isolated in the main repository. We might want to do some integration >> tests at some point, but I'm not sure if we should start testing dev >> versions against each other. The main issue with triggering the CI is to >> not exponentially explode the ignore list >> <https://github.com/apache/iceberg/blob/master/.github/workflows/flink-ci.yml#L20-L51> >> of a Github action. An example here >> <https://github.com/apache/iceberg/pull/8546#issuecomment-1712958280> is >> where the Python GA file was not properly excluded. >> >> I would much rather rely on some reference tests that Jean-Baptiste >> mentioned at the Java Iceberg 1.4.0 release, and that we're also working on >> at Tabular (disclaimer: I'm working for Tabular). Python i inspired by >> Java, and we've recently uncovered some issues >> <https://github.com/apache/iceberg/pull/8673> (thanks Jan Finis!) with >> respect to adhering to the spec, so I think a strict approach to validate >> the implementations would be preferred. >> >> That said, in PyIceberg we use Spark (which uses the Java library) to run >> integration tests. This is based on the released versions which works very >> well. Not sure if we should create matrices between >> Python/Go/Rust/Iceberg/Athena/Snowflake/... (you're seeing where this is >> going) :) But these are just my thoughts today and might change in the >> future. >> >> Thanks everyone, I'll go ahead and merge the PR that includes the history. >> >> Cheers, Fokko >> >> Ps. The repo might look a bit funky, but that's because I've created the >> pr-branch before the main branch. I didn't know that the branch that was >> created first, would be promoted to the default branch. I'm working with >> Apache >> Infra <https://issues.apache.org/jira/browse/INFRA-25029> to get it >> fixed. >> >> Op za 30 sep 2023 om 20:29 schreef Daniel Weeks <dwe...@apache.org>: >> >>> +1 to relocate with history. >>> >>> On Sat, Sep 30, 2023, 10:24 AM Brian Olsen <bitsondata...@gmail.com> >>> wrote: >>> >>>> This shouldn’t be too hard and can likely be a nightly build that >>>> occurs with each client repository. >>>> >>>> We’re already planning on doing the documentation using git submodule >>>> to pull all the documentation under a single build in the central repo. We >>>> can likely go the other direction to run client-core integration tests. I >>>> prefer these go on the client end to avoid too much ci running on the core >>>> repo. We have to also consider whatever we choose to do with Python client >>>> we will also apply to go, Rust, and any future client. Happy to hear >>>> alternatives though! >>>> >>>> WDYT Fokko? >>>> >>>> >>>> >>>> On Sat, Sep 30, 2023 at 7:12 AM Hussein Awala <huss...@awala.fr> wrote: >>>> >>>>> +1 >>>>> >>>>> I checked the discussion thread, and one of the motivations for this >>>>> separation was to avoid triggering unrelated CI jobs after each change. >>>>> However, I wonder if it isn't (and will not be) necessary to check the >>>>> compatibility between the main repository and the client after each >>>>> change. >>>>> Otherwise, we will need to trigger the CI across the different >>>>> repositories >>>>> using the GHA API, not necessarily to block the PR, but just to give quick >>>>> feedback and notification that something needs to be changed on the client >>>>> side. >>>>> >>>>> On Fri, Sep 29, 2023 at 9:39 PM Brian Olsen <bitsondata...@gmail.com> >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> Great work Fokko! >>>>>> >>>>>> Pucheng, >>>>>> >>>>>> We still want to maintain all of the issues in the Python repository. >>>>>> The one thing we will lose is pull requests, but I assume there are very >>>>>> few. >>>>>> >>>>>> On Fri, Sep 29, 2023 at 10:34 AM Pucheng Yang >>>>>> <py...@pinterest.com.invalid> wrote: >>>>>> >>>>>>> Thanks for doing this. I wonder how do we deal with all the issues >>>>>>> filed for python module but still open in iceberg repo? >>>>>>> >>>>>>> On Fri, Sep 29, 2023 at 7:55 AM Eduard Tudenhoefner < >>>>>>> edu...@tabular.io> wrote: >>>>>>> >>>>>>>> +1 on moving to a separate repo and maintaining git history >>>>>>>> >>>>>>>> On Fri, Sep 29, 2023 at 3:30 PM Jean-Baptiste Onofré < >>>>>>>> j...@nanthrax.net> wrote: >>>>>>>> >>>>>>>>> Awesome, it looks even better ;) >>>>>>>>> >>>>>>>>> Thanks ! >>>>>>>>> Regards >>>>>>>>> JB >>>>>>>>> >>>>>>>>> On Fri, Sep 29, 2023 at 2:31 PM Fokko Driesprong <fo...@apache.org> >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > Hey Ajantha, >>>>>>>>> > >>>>>>>>> > That's a great suggestion. I've followed the steps and created a >>>>>>>>> new PR here: https://github.com/apache/iceberg-python/pull/3 >>>>>>>>> > >>>>>>>>> > The subdirectory-filter command moves a subdirectory to the root >>>>>>>>> directory. This way I still had to add some files afterward >>>>>>>>> (.github/*, >>>>>>>>> .gitignore, etc.), these are in a separate commit. Please take a look. >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > >>>>>>>>> > Fokko >>>>>>>>> > >>>>>>>>> > Op vr 29 sep 2023 om 13:39 schreef Ajantha Bhat < >>>>>>>>> ajanthab...@gmail.com>: >>>>>>>>> >> >>>>>>>>> >> I think we are gonna lose the history of commits if we merge >>>>>>>>> the above PR. >>>>>>>>> >> >>>>>>>>> >> There are ways to move the subfolder into a new repo by >>>>>>>>> retaining commit history. >>>>>>>>> >> For example: >>>>>>>>> >> - >>>>>>>>> https://medium.com/@ayushya/move-directory-from-one-repository-to-another-preserving-git-history-d210fa049d4b >>>>>>>>> >> - https://gist.github.com/trongthanh/2779392 >>>>>>>>> >> >>>>>>>>> >> Please give it a try. >>>>>>>>> >> >>>>>>>>> >> Thanks, >>>>>>>>> >> Ajantha >>>>>>>>> >> >>>>>>>>> >> On Fri, Sep 29, 2023 at 4:55 PM Fokko Driesprong < >>>>>>>>> fo...@apache.org> wrote: >>>>>>>>> >>> >>>>>>>>> >>> Hey everyone 👋 >>>>>>>>> >>> >>>>>>>>> >>> A while ago we discussed that Rust and Go are going into a >>>>>>>>> separate repository: >>>>>>>>> https://lists.apache.org/thread/4s02lmwf1kyrxxdpj3q9w2fqnxq2llbn >>>>>>>>> >>> >>>>>>>>> >>> Since we just did the PyIcerg 0.5.0 release, I think it is a >>>>>>>>> good moment to migrate PyIceberg to iceberg-python as well: >>>>>>>>> https://github.com/apache/iceberg-python/pull/2 I went over the >>>>>>>>> PRs that are ready to merge and got them in. If there is anything >>>>>>>>> missing, >>>>>>>>> please let me know. >>>>>>>>> >>> >>>>>>>>> >>> I would suggest merging the PR and leaving the source code in >>>>>>>>> the main repository for another week or so to make sure that we >>>>>>>>> didn't miss >>>>>>>>> anything. >>>>>>>>> >>> >>>>>>>>> >>> Since PyIceberg now also hosts the docs on the Github pages of >>>>>>>>> the Iceberg repository, moving PyIceberg will also free up the Github >>>>>>>>> pages >>>>>>>>> for the migration of the docs back into the main repository. >>>>>>>>> >>> >>>>>>>>> >>> Let me know if there are any concerns. >>>>>>>>> >>> >>>>>>>>> >>> Kind regards, >>>>>>>>> >>> Fokko Driesprong >>>>>>>>> >>>>>>>>