Hey Kevin, I agree with your reasoning. Having reproducible builds would be nice, but it doesn't seem very popular in Pythonland. For example, cibuildwheel <https://github.com/pypa/cibuildwheel> that we use, doesn't mention anything about reproducibility. The steps that your suggestion make sense to me. I've replied on #1387 <https://github.com/apache/iceberg-python/pull/1387> to bring it in line with the process that you're suggesting.
For the nightly builds, I would suggest using http://test.pypi.org/, I've already claimed pyiceberg <https://test.pypi.org/project/pyiceberg/> there. In order to install, you need to pass the index explicitly: pip install -i https://test.pypi.org/simple/ pyiceberg. To add the token to the Github Action secrets, you can file an INFRA ticket <https://issues.apache.org/>. Thanks for working on this, once this is in place, releasing will be a breeze! Kind regards, Fokko Op wo 4 dec 2024 om 05:51 schreef Kevin Liu <kevinjq...@apache.org>: > Hi Fokko, > > Thank you for pointing me to the ASF Release Policy, it was very > informative! > > Based on the policies, we cannot automate signing in GitHub Actions > because Python wheels are not considered reproducible [1]. The wheels > include information about the current date and time, which prevents > validation with checksums. While there might be ways to create reproducible > wheels [2], I don’t think it’s necessary at this time. > > The policy around trusted hardware is interesting. This means the release > manager must download the artifacts and verify them locally before > publishing. > > hardware owned and controlled by the committer. That means hardware the > committer has physical possession and control of and exclusively full > administrative/superuser access to. > > Given the above, the steps for "Create a Release Candidate" should be as > follows: > * Create and push a tag for the Release Candidate (e.g., `0.8.1rc1`) > * The tag push triggers a GitHub Action workflow that builds artifacts for > both SVN and PyPI > * The Release Manager downloads the SVN and PyPI artifacts locally > * The Release Manager generates SHA-512 checksums and GPG signatures > locally for the SVN artifacts and uploads them to SVN. > * Release Manager uploads the PyPi artifacts to PyPi > > The GitHub Action will no longer create checksum or signature files. > Instead, the Release Manager will need to set up the required SVN and GPG > infrastructure locally. > > I will change the release process accordingly in PR #1391 > <https://github.com/apache/iceberg-python/pull/1391> and update the "how > to release" documentation as well. > > Regarding the Nightly Builds, we can use a GitHub Action to build and push > artifacts to PyPI. The "build" step is already in place; we just need a > PyPI API key to push the artifacts. I can generate a new PyPI token using > my own PyPI account, or we could request one from ASF Infra. > > Thanks again for your help! > > Best, > Kevin Liu > > [1] https://reproducible-builds.org/ > [2] https://github.com/wimglenn/setuptools-reproducible > > On Tue, Dec 3, 2024 at 1:31 AM Fokko Driesprong <fo...@apache.org> wrote: > >> Hey Kevin, >> >> First of all, thanks for working on the releases, that's always much >> appreciated. >> >> Regarding the changes to the release process, I'm all for automating as >> much as possible, but I have some concerns. I also think it is important to >> split out nightly builds, and the release process in general. >> >> Releases >> >> Concerning the releases, I think the official ASF release policy >> <https://www.apache.org/legal/release-policy.html#artifacts> is a very >> good read. While reading up on this topic, I noticed that it is allowed to >> have >> automated signing of the release >> <https://infra.apache.org/release-signing.html#automated-release-signing>, >> but this comes with some prerequisites, such as having reproducible builds. >> This is not the case for PyIceberg today, ie. if you build wheels twice, >> they have different checksums. Also, a manual validation step is still part >> of the process, where all artifacts are produced on trusted hardware >> <https://www.apache.org/legal/release-policy.html#owned-controlled-hardware> >> before >> publication. >> >> I would lean much more towards the way Iceberg-Go has solved this >> <https://github.com/apache/iceberg-go/blob/main/dev/release/release_rc.sh>. >> It creates a tag locally and pushes it to the repository, the tag triggers >> a Github Action workflow, generating the required artifacts and the >> convenience artifacts. Those are downloaded to the local machine, >> signature, and checksum are added, and pushed back to GitHub Actions and >> SVN. >> >> Nightly Builds >> >> As also stated in the release policy. We can provide nightly builds for >> the development community, but as a project, we should direct outsiders >> toward the official releases. Since a nightly build is not an official >> release >> <https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/>, >> it doesn't have to go through the whole process including signatures and >> checksums, as long it is sufficiently hidden from the end-users and >> intended for developers only. We could, for example, push a nightly build >> to test.pypi.org <https://test.pypi.org/project/pyiceberg/>. >> >> Thanks again for working on this. Hope this helps, and let me know what >> you think! >> >> Kind regards, >> Fokko >> >> >> Op ma 2 dec 2024 om 20:38 schreef Kevin Liu <kevinjq...@apache.org>: >> >>> Hi everyone, >>> >>> As the release manager for PyIceberg 0.8.0 and the upcoming 0.8.1 >>> release, I’ve taken some time to reflect on ways we could improve the >>> release process. I drew inspiration from the iceberg-go release process and >>> documented my notes here >>> <https://github.com/apache/iceberg-python/issues/1306>. I’ve also >>> updated the release instructions here >>> <https://py.iceberg.apache.org/how-to-release/>. >>> >>> Currently, the release process is manual and prone to errors. My goal is >>> to automate it as much as possible, ideally transforming it into a >>> single-click process. >>> >>> I’d like to gather your thoughts on two key ideas: >>> >>> 1. Automating the release process to reduce manual steps and errors. >>> 2. Introducing nightly builds to PyPI once automation is in place (issue >>> #872 <https://github.com/apache/iceberg-python/issues/872>). >>> >>> The PyIceberg release process can be summarized in these steps: >>> >>> - Create a Release Candidate (RC) >>> - Vote on the devlist >>> - Promote the RC to a Final Release >>> >>> I believe the *"*Create a Release Candidate*"* step can benefit the >>> most from automation. Here’s a breakdown of the current steps: >>> >>> - Create a tag for the Release Candidate (e.g., `0.8.1rc1`). >>> - Generate artifacts (currently done using GitHub Actions). >>> - Generate SHA-512 checksums and GPG signatures, then upload the >>> artifacts to SVN. >>> - Upload the artifacts to PyPI. >>> >>> To automate these steps via GitHub Actions, we’d need to address the >>> following: >>> >>> - *GPG Signing*: GitHub Actions require a `GPG_PRIVATE_KEY` secret. >>> I’ve tested this with my own key, but it would be better to create a new >>> key (possibly owned by ASF) for signing files. >>> - *SVN Uploads*: Uploading artifacts to SVN requires credentials. I >>> haven’t tested this step yet, but we should aim to use credentials >>> provided >>> by ASF Infra instead of personal ones. >>> - *PyPI Uploads*: Similarly, uploading to PyPI requires an API >>> token, which should ideally be provided by ASF Infra. >>> >>> I’ve begun automating the artifact generation process (PR #1391 >>> <https://github.com/apache/iceberg-python/pull/1391>). However, the >>> release manager currently still needs to manually download and upload >>> artifacts to both SVN and PyPI. >>> >>> Once the "Create a Release Candidate" step is automated, we can create >>> a GitHub Action to manually build and upload a nightly version to PyPi. >>> >>> >>> *Is this the direction we want to take for the release process? If so, >>> what’s the best way to coordinate with ASF Infra to create the necessary >>> credentials?* >>> >>> I’d love to hear your thoughts and any additional suggestions. >>> Best, >>> Kevin Liu >>> >>>