Hi all,

Just in case it's useful for someone else, in Apache Pulsar, there's a GitHub 
Actions-based CI workflow that creates a Docker image and runs integration 
tests and system tests with it. In Pulsar, we have an extremely large Docker 
image for system tests; it's over 1.7GiB when compressed with zstd. Building 
this image takes over 20 minutes, so we want to share the image within a single 
build workflow. GitHub Artifacts are the recommended way to share files between 
jobs in a single workflow, as explained in the GitHub Actions documentation: 
https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow
 .

To share the Docker image within a single build workflow, we use GitHub 
Artifacts upload/download with a custom CLI tool that uses the GitHub-provided 
JavaScript libraries for interacting with the GitHub Artifacts backend API. The 
benefit of the CLI tool for GitHub Actions Artifacts is that it can upload from 
stdin and download to stdout. Sharing the Docker images in the GitHub Actions 
workflow is simply done with the CLI tool and standard "docker load" and 
"docker save" commands.

These are the shell script functions that Apache Pulsar uses: 
https://github.com/apache/pulsar/blob/1344167328c31ea39054ec2a6019f003fb8bab50/build/pulsar_ci_tool.sh#L82-L101

In Pulsar CI, the command for saving the image is:
docker save ${image} | zstd | pv -ft -i 5 | pv -Wbaf -i 5 | timeout 20m 
gh-actions-artifact-client.js upload --retentionDays=$ARTIFACT_RETENTION_DAYS 
"${artifactname}"

For restoring, the command used is:
timeout 20m gh-actions-artifact-client.js download "${artifactname}" | pv -batf 
-i 5 | unzstd | docker load

The throughput is very impressive. Transfer speed can exceed 180MiB/s when 
uploading the Docker image, and downloads are commonly over 100MiB/s in 
apache/pulsar builds. It's notable that the transfer includes the execution of 
"docker load" and "docker save" since it's directly operating on stdin and 
stdout.
Examples:
upload: 
https://github.com/apache/pulsar/actions/runs/11454093832/job/31880154863#step:15:26
download: 
https://github.com/apache/pulsar/actions/runs/11454093832/job/31880164467#step:9:20

Since GitHub Artifacts doesn't provide an official CLI tool, I have written a 
GitHub Action for that purpose. It's available at 
https://github.com/lhotari/gh-actions-artifact-client.
When you use the action, it will install the CLI tool available as 
"gh-actions-artifact-client.js" in the PATH of the runner so that it's 
available in subsequent build steps. In Apache Pulsar, we fork external actions 
to our own repository, so we use the version forked to 
https://github.com/apache/pulsar-test-infra.

In Pulsar, we have been using this solution successfully for several years. I 
recently upgraded the action to support the GitHub Actions Artifacts API v4, as 
earlier API versions will be removed after November 30th.

I hope this helps other projects that face similar CI challenges as Pulsar has. 
Please let me know if you need any help in using a similar solution for your 
Apache project's CI.

-Lari

Reply via email to