Hey Kevin,

I agree with your reasoning. Having reproducible builds would be nice, but
it doesn't seem very popular in Pythonland. For example, cibuildwheel
<https://github.com/pypa/cibuildwheel> that we use, doesn't mention
anything about reproducibility. The steps that your suggestion make sense
to me. I've replied on #1387
<https://github.com/apache/iceberg-python/pull/1387> to bring it in line
with the process that you're suggesting.

For the nightly builds, I would suggest using http://test.pypi.org/, I've
already claimed pyiceberg <https://test.pypi.org/project/pyiceberg/> there.
In order to install, you need to pass the index explicitly: pip install -i
https://test.pypi.org/simple/ pyiceberg. To add the token to the Github
Action secrets, you can file an INFRA ticket <https://issues.apache.org/>.

Thanks for working on this, once this is in place, releasing will be a
breeze!

Kind regards,
Fokko


Op wo 4 dec 2024 om 05:51 schreef Kevin Liu <kevinjq...@apache.org>:

> Hi Fokko,
>
> Thank you for pointing me to the ASF Release Policy, it was very
> informative!
>
> Based on the policies, we cannot automate signing in GitHub Actions
> because Python wheels are not considered reproducible [1]. The wheels
> include information about the current date and time, which prevents
> validation with checksums. While there might be ways to create reproducible
> wheels [2], I don’t think it’s necessary at this time.
>
> The policy around trusted hardware is interesting. This means the release
> manager must download the artifacts and verify them locally before
> publishing.
> > hardware owned and controlled by the committer. That means hardware the
> committer has physical possession and control of and exclusively full
> administrative/superuser access to.
>
> Given the above, the steps for "Create a Release Candidate" should be as
> follows:
> * Create and push a tag for the Release Candidate (e.g., `0.8.1rc1`)
> * The tag push triggers a GitHub Action workflow that builds artifacts for
> both SVN and PyPI
> * The Release Manager downloads the SVN and PyPI artifacts locally
> * The Release Manager generates SHA-512 checksums and GPG signatures
> locally for the SVN artifacts and uploads them to SVN.
> * Release Manager uploads the PyPi artifacts to PyPi
>
> The GitHub Action will no longer create checksum or signature files.
> Instead, the Release Manager will need to set up the required SVN and GPG
> infrastructure locally.
>
> I will change the release process accordingly in PR #1391
> <https://github.com/apache/iceberg-python/pull/1391> and update the "how
> to release" documentation as well.
>
> Regarding the Nightly Builds, we can use a GitHub Action to build and push
> artifacts to PyPI. The "build" step is already in place; we just need a
> PyPI API key to push the artifacts. I can generate a new PyPI token using
> my own PyPI account, or we could request one from ASF Infra.
>
> Thanks again for your help!
>
> Best,
> Kevin Liu
>
> [1] https://reproducible-builds.org/
> [2] https://github.com/wimglenn/setuptools-reproducible
>
> On Tue, Dec 3, 2024 at 1:31 AM Fokko Driesprong <fo...@apache.org> wrote:
>
>> Hey Kevin,
>>
>> First of all, thanks for working on the releases, that's always much
>> appreciated.
>>
>> Regarding the changes to the release process, I'm all for automating as
>> much as possible, but I have some concerns. I also think it is important to
>> split out nightly builds, and the release process in general.
>>
>> Releases
>>
>> Concerning the releases, I think the official ASF release policy
>> <https://www.apache.org/legal/release-policy.html#artifacts> is a very
>> good read. While reading up on this topic, I noticed that it is allowed to 
>> have
>> automated signing of the release
>> <https://infra.apache.org/release-signing.html#automated-release-signing>,
>> but this comes with some prerequisites, such as having reproducible builds.
>> This is not the case for PyIceberg today, ie. if you build wheels twice,
>> they have different checksums. Also, a manual validation step is still part
>> of the process, where all artifacts are produced on trusted hardware
>> <https://www.apache.org/legal/release-policy.html#owned-controlled-hardware> 
>> before
>> publication.
>>
>> I would lean much more towards the way Iceberg-Go has solved this
>> <https://github.com/apache/iceberg-go/blob/main/dev/release/release_rc.sh>.
>> It creates a tag locally and pushes it to the repository, the tag triggers
>> a Github Action workflow, generating the required artifacts and the
>> convenience artifacts. Those are downloaded to the local machine,
>> signature, and checksum are added, and pushed back to GitHub Actions and
>> SVN.
>>
>> Nightly Builds
>>
>> As also stated in the release policy. We can provide nightly builds for
>> the development community, but as a project, we should direct outsiders
>> toward the official releases. Since a nightly build is not an official
>> release
>> <https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/>,
>> it doesn't have to go through the whole process including signatures and
>> checksums, as long it is sufficiently hidden from the end-users and
>> intended for developers only. We could, for example, push a nightly build
>> to test.pypi.org <https://test.pypi.org/project/pyiceberg/>.
>>
>> Thanks again for working on this. Hope this helps, and let me know what
>> you think!
>>
>> Kind regards,
>> Fokko
>>
>>
>> Op ma 2 dec 2024 om 20:38 schreef Kevin Liu <kevinjq...@apache.org>:
>>
>>> Hi everyone,
>>>
>>> As the release manager for PyIceberg 0.8.0 and the upcoming 0.8.1
>>> release, I’ve taken some time to reflect on ways we could improve the
>>> release process. I drew inspiration from the iceberg-go release process and
>>> documented my notes here
>>> <https://github.com/apache/iceberg-python/issues/1306>. I’ve also
>>> updated the release instructions here
>>> <https://py.iceberg.apache.org/how-to-release/>.
>>>
>>> Currently, the release process is manual and prone to errors. My goal is
>>> to automate it as much as possible, ideally transforming it into a
>>> single-click process.
>>>
>>> I’d like to gather your thoughts on two key ideas:
>>>
>>>    1. Automating the release process to reduce manual steps and errors.
>>>    2. Introducing nightly builds to PyPI once automation is in place (issue
>>>    #872 <https://github.com/apache/iceberg-python/issues/872>).
>>>
>>> The PyIceberg release process can be summarized in these steps:
>>>
>>>    - Create a Release Candidate (RC)
>>>    - Vote on the devlist
>>>    - Promote the RC to a Final Release
>>>
>>> I believe the *"*Create a Release Candidate*"* step can benefit the
>>> most from automation. Here’s a breakdown of the current steps:
>>>
>>>    - Create a tag for the Release Candidate (e.g., `0.8.1rc1`).
>>>    - Generate artifacts (currently done using GitHub Actions).
>>>    - Generate SHA-512 checksums and GPG signatures, then upload the
>>>    artifacts to SVN.
>>>    - Upload the artifacts to PyPI.
>>>
>>> To automate these steps via GitHub Actions, we’d need to address the
>>> following:
>>>
>>>    - *GPG Signing*: GitHub Actions require a `GPG_PRIVATE_KEY` secret.
>>>    I’ve tested this with my own key, but it would be better to create a new
>>>    key (possibly owned by ASF) for signing files.
>>>    - *SVN Uploads*: Uploading artifacts to SVN requires credentials. I
>>>    haven’t tested this step yet, but we should aim to use credentials 
>>> provided
>>>    by ASF Infra instead of personal ones.
>>>    - *PyPI Uploads*: Similarly, uploading to PyPI requires an API
>>>    token, which should ideally be provided by ASF Infra.
>>>
>>> I’ve begun automating the artifact generation process (PR #1391
>>> <https://github.com/apache/iceberg-python/pull/1391>). However, the
>>> release manager currently still needs to manually download and upload
>>> artifacts to both SVN and PyPI.
>>>
>>> Once the "Create a Release Candidate" step is automated, we can create
>>> a GitHub Action to manually build and upload a nightly version to PyPi.
>>>
>>>
>>> *Is this the direction we want to take for the release process? If so,
>>> what’s the best way to coordinate with ASF Infra to create the necessary
>>> credentials?*
>>>
>>> I’d love to hear your thoughts and any additional suggestions.
>>> Best,
>>> Kevin Liu
>>>
>>>

Reply via email to