+1 for removing it. Now that we're working in iceberg-python, it is just going to get stale and confusing.
On Sun, Oct 8, 2023 at 12:07 PM Fokko Driesprong <fo...@apache.org> wrote: > Hey everyone, > > It has been a week since PyIceberg migrated to its own repository. Should > we move forward by removing the Python codebase from the main repository? > Ajantha already raised a pull-request > <https://github.com/apache/iceberg/pull/8695> to do this (thank you for > that 🙌). > > Kind regards, > Fokko > > Op ma 2 okt 2023 om 16:16 schreef Fokko Driesprong <fo...@apache.org>: > >> Hey everyone, >> >> Update from my side. I've moved all the issues >> <https://github.com/apache/iceberg-python/issues> and my PRs >> <https://github.com/apache/iceberg-python/pulls>. Not all issues needed >> to be migrated since a lot of them were already fixed. I've closed the >> remaining PRs that were still open, those are either abandoned, failed on >> CI, or had changes pending. Of course, with the kind request to re-open >> them to the iceberg-python repository. >> >> Ajantha already created a PR >> <https://github.com/apache/iceberg/pull/8695> (thanks for that!) to >> remove Python from the iceberg repo. >> >> Kind regards, Fokko >> >> >> Op za 30 sep 2023 om 21:06 schreef Fokko Driesprong <fo...@apache.org>: >> >>> Hey everyone, >>> >>> Pucheng: I wonder how do we deal with all the issues filed for python >>>> module but still open in iceberg repo? >>> >>> >>> That's a good point. I think we should migrate them. I checked and it is >>> only 3 pages >>> <https://github.com/apache/iceberg/issues?q=is%3Aissue+is%3Aopen+python>. >>> Likely a few more if we query on other keywords. I think migrating them by >>> hand is feasible. It also gives us a chance to clean them up (all the >>> issues on the last page I linked above are not relevant anymore, and can be >>> closed). >>> >>> Brian: The one thing we will lose is pull requests, but I assume there >>>> are very few. >>> >>> >>> I've checked those as well, and as Brian already mentioned, there are just >>> a few >>> <https://github.com/apache/iceberg/pulls?q=is%3Apr+is%3Aopen+label%3Apython>. >>> There is never a perfect moment since there are always PRs open that will >>> break, but just after the release I think is the best worst moment :) The >>> PRs that are open are trivial to move to the new repo as well. >>> >>> Hussain: I checked the discussion thread, and one of the motivations for >>>> this separation was to avoid triggering unrelated CI jobs after each >>>> change. However, I wonder if it isn't (and will not be) necessary to check >>>> the compatibility between the main repository and the client after each >>>> change. Otherwise, we will need to trigger the CI across the different >>>> repositories using the GHA API, not necessarily to block the PR, but just >>>> to give quick feedback and notification that something needs to be changed >>>> on the client side. >>> >>> >>> Checking between dev versions is not something we do today, and >>> PyIceberg lives isolated in the main repository. We might want to do some >>> integration tests at some point, but I'm not sure if we should start >>> testing dev versions against each other. The main issue with triggering the >>> CI is to not exponentially explode the ignore list >>> <https://github.com/apache/iceberg/blob/master/.github/workflows/flink-ci.yml#L20-L51> >>> of a Github action. An example here >>> <https://github.com/apache/iceberg/pull/8546#issuecomment-1712958280> is >>> where the Python GA file was not properly excluded. >>> >>> I would much rather rely on some reference tests that Jean-Baptiste >>> mentioned at the Java Iceberg 1.4.0 release, and that we're also working on >>> at Tabular (disclaimer: I'm working for Tabular). Python i inspired by >>> Java, and we've recently uncovered some issues >>> <https://github.com/apache/iceberg/pull/8673> (thanks Jan Finis!) with >>> respect to adhering to the spec, so I think a strict approach to validate >>> the implementations would be preferred. >>> >>> That said, in PyIceberg we use Spark (which uses the Java library) to >>> run integration tests. This is based on the released versions which works >>> very well. Not sure if we should create matrices between >>> Python/Go/Rust/Iceberg/Athena/Snowflake/... (you're seeing where this is >>> going) :) But these are just my thoughts today and might change in the >>> future. >>> >>> Thanks everyone, I'll go ahead and merge the PR that includes the >>> history. >>> >>> Cheers, Fokko >>> >>> Ps. The repo might look a bit funky, but that's because I've created the >>> pr-branch before the main branch. I didn't know that the branch that was >>> created first, would be promoted to the default branch. I'm working with >>> Apache >>> Infra <https://issues.apache.org/jira/browse/INFRA-25029> to get it >>> fixed. >>> >>> Op za 30 sep 2023 om 20:29 schreef Daniel Weeks <dwe...@apache.org>: >>> >>>> +1 to relocate with history. >>>> >>>> On Sat, Sep 30, 2023, 10:24 AM Brian Olsen <bitsondata...@gmail.com> >>>> wrote: >>>> >>>>> This shouldn’t be too hard and can likely be a nightly build that >>>>> occurs with each client repository. >>>>> >>>>> We’re already planning on doing the documentation using git submodule >>>>> to pull all the documentation under a single build in the central repo. We >>>>> can likely go the other direction to run client-core integration tests. I >>>>> prefer these go on the client end to avoid too much ci running on the core >>>>> repo. We have to also consider whatever we choose to do with Python client >>>>> we will also apply to go, Rust, and any future client. Happy to hear >>>>> alternatives though! >>>>> >>>>> WDYT Fokko? >>>>> >>>>> >>>>> >>>>> On Sat, Sep 30, 2023 at 7:12 AM Hussein Awala <huss...@awala.fr> >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> I checked the discussion thread, and one of the motivations for this >>>>>> separation was to avoid triggering unrelated CI jobs after each change. >>>>>> However, I wonder if it isn't (and will not be) necessary to check the >>>>>> compatibility between the main repository and the client after each >>>>>> change. >>>>>> Otherwise, we will need to trigger the CI across the different >>>>>> repositories >>>>>> using the GHA API, not necessarily to block the PR, but just to give >>>>>> quick >>>>>> feedback and notification that something needs to be changed on the >>>>>> client >>>>>> side. >>>>>> >>>>>> On Fri, Sep 29, 2023 at 9:39 PM Brian Olsen <bitsondata...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> Great work Fokko! >>>>>>> >>>>>>> Pucheng, >>>>>>> >>>>>>> We still want to maintain all of the issues in the Python >>>>>>> repository. The one thing we will lose is pull requests, but I assume >>>>>>> there >>>>>>> are very few. >>>>>>> >>>>>>> On Fri, Sep 29, 2023 at 10:34 AM Pucheng Yang >>>>>>> <py...@pinterest.com.invalid> wrote: >>>>>>> >>>>>>>> Thanks for doing this. I wonder how do we deal with all the issues >>>>>>>> filed for python module but still open in iceberg repo? >>>>>>>> >>>>>>>> On Fri, Sep 29, 2023 at 7:55 AM Eduard Tudenhoefner < >>>>>>>> edu...@tabular.io> wrote: >>>>>>>> >>>>>>>>> +1 on moving to a separate repo and maintaining git history >>>>>>>>> >>>>>>>>> On Fri, Sep 29, 2023 at 3:30 PM Jean-Baptiste Onofré < >>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>> >>>>>>>>>> Awesome, it looks even better ;) >>>>>>>>>> >>>>>>>>>> Thanks ! >>>>>>>>>> Regards >>>>>>>>>> JB >>>>>>>>>> >>>>>>>>>> On Fri, Sep 29, 2023 at 2:31 PM Fokko Driesprong < >>>>>>>>>> fo...@apache.org> wrote: >>>>>>>>>> > >>>>>>>>>> > Hey Ajantha, >>>>>>>>>> > >>>>>>>>>> > That's a great suggestion. I've followed the steps and created >>>>>>>>>> a new PR here: https://github.com/apache/iceberg-python/pull/3 >>>>>>>>>> > >>>>>>>>>> > The subdirectory-filter command moves a subdirectory to the >>>>>>>>>> root directory. This way I still had to add some files afterward >>>>>>>>>> (.github/*, .gitignore, etc.), these are in a separate commit. >>>>>>>>>> Please take >>>>>>>>>> a look. >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > >>>>>>>>>> > Fokko >>>>>>>>>> > >>>>>>>>>> > Op vr 29 sep 2023 om 13:39 schreef Ajantha Bhat < >>>>>>>>>> ajanthab...@gmail.com>: >>>>>>>>>> >> >>>>>>>>>> >> I think we are gonna lose the history of commits if we merge >>>>>>>>>> the above PR. >>>>>>>>>> >> >>>>>>>>>> >> There are ways to move the subfolder into a new repo by >>>>>>>>>> retaining commit history. >>>>>>>>>> >> For example: >>>>>>>>>> >> - >>>>>>>>>> https://medium.com/@ayushya/move-directory-from-one-repository-to-another-preserving-git-history-d210fa049d4b >>>>>>>>>> >> - https://gist.github.com/trongthanh/2779392 >>>>>>>>>> >> >>>>>>>>>> >> Please give it a try. >>>>>>>>>> >> >>>>>>>>>> >> Thanks, >>>>>>>>>> >> Ajantha >>>>>>>>>> >> >>>>>>>>>> >> On Fri, Sep 29, 2023 at 4:55 PM Fokko Driesprong < >>>>>>>>>> fo...@apache.org> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>> Hey everyone 👋 >>>>>>>>>> >>> >>>>>>>>>> >>> A while ago we discussed that Rust and Go are going into a >>>>>>>>>> separate repository: >>>>>>>>>> https://lists.apache.org/thread/4s02lmwf1kyrxxdpj3q9w2fqnxq2llbn >>>>>>>>>> >>> >>>>>>>>>> >>> Since we just did the PyIcerg 0.5.0 release, I think it is a >>>>>>>>>> good moment to migrate PyIceberg to iceberg-python as well: >>>>>>>>>> https://github.com/apache/iceberg-python/pull/2 I went over the >>>>>>>>>> PRs that are ready to merge and got them in. If there is anything >>>>>>>>>> missing, >>>>>>>>>> please let me know. >>>>>>>>>> >>> >>>>>>>>>> >>> I would suggest merging the PR and leaving the source code in >>>>>>>>>> the main repository for another week or so to make sure that we >>>>>>>>>> didn't miss >>>>>>>>>> anything. >>>>>>>>>> >>> >>>>>>>>>> >>> Since PyIceberg now also hosts the docs on the Github pages >>>>>>>>>> of the Iceberg repository, moving PyIceberg will also free up the >>>>>>>>>> Github >>>>>>>>>> pages for the migration of the docs back into the main repository. >>>>>>>>>> >>> >>>>>>>>>> >>> Let me know if there are any concerns. >>>>>>>>>> >>> >>>>>>>>>> >>> Kind regards, >>>>>>>>>> >>> Fokko Driesprong >>>>>>>>>> >>>>>>>>> -- Ryan Blue Tabular