Love it. Already created an issue in Airflow CI/CD that will help us to
simplify our workflows based on your experiences:
https://github.com/apache/airflow/issues/43268 - will report back when we
turn it into code.

Thanks for sharing!

J.


On Tue, Oct 22, 2024 at 4:02 PM Nathan Hartman <hartman.nat...@gmail.com>
wrote:

> On Tue, Oct 22, 2024 at 7:08 AM Lari Hotari <lhot...@apache.org> wrote:
> >
> > Hi all,
> >
> > Just in case it's useful for someone else, in Apache Pulsar, there's a
> GitHub Actions-based CI workflow that creates a Docker image and runs
> integration tests and system tests with it. In Pulsar, we have an extremely
> large Docker image for system tests; it's over 1.7GiB when compressed with
> zstd. Building this image takes over 20 minutes, so we want to share the
> image within a single build workflow. GitHub Artifacts are the recommended
> way to share files between jobs in a single workflow, as explained in the
> GitHub Actions documentation:
> https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow
> .
> >
> > To share the Docker image within a single build workflow, we use GitHub
> Artifacts upload/download with a custom CLI tool that uses the
> GitHub-provided JavaScript libraries for interacting with the GitHub
> Artifacts backend API. The benefit of the CLI tool for GitHub Actions
> Artifacts is that it can upload from stdin and download to stdout. Sharing
> the Docker images in the GitHub Actions workflow is simply done with the
> CLI tool and standard "docker load" and "docker save" commands.
> >
> > These are the shell script functions that Apache Pulsar uses:
> https://github.com/apache/pulsar/blob/1344167328c31ea39054ec2a6019f003fb8bab50/build/pulsar_ci_tool.sh#L82-L101
> >
> > In Pulsar CI, the command for saving the image is:
> > docker save ${image} | zstd | pv -ft -i 5 | pv -Wbaf -i 5 | timeout 20m
> gh-actions-artifact-client.js upload
> --retentionDays=$ARTIFACT_RETENTION_DAYS "${artifactname}"
> >
> > For restoring, the command used is:
> > timeout 20m gh-actions-artifact-client.js download "${artifactname}" |
> pv -batf -i 5 | unzstd | docker load
> >
> > The throughput is very impressive. Transfer speed can exceed 180MiB/s
> when uploading the Docker image, and downloads are commonly over 100MiB/s
> in apache/pulsar builds. It's notable that the transfer includes the
> execution of "docker load" and "docker save" since it's directly operating
> on stdin and stdout.
> > Examples:
> > upload:
> https://github.com/apache/pulsar/actions/runs/11454093832/job/31880154863#step:15:26
> > download:
> https://github.com/apache/pulsar/actions/runs/11454093832/job/31880164467#step:9:20
> >
> > Since GitHub Artifacts doesn't provide an official CLI tool, I have
> written a GitHub Action for that purpose. It's available at
> https://github.com/lhotari/gh-actions-artifact-client.
> > When you use the action, it will install the CLI tool available as
> "gh-actions-artifact-client.js" in the PATH of the runner so that it's
> available in subsequent build steps. In Apache Pulsar, we fork external
> actions to our own repository, so we use the version forked to
> https://github.com/apache/pulsar-test-infra.
> >
> > In Pulsar, we have been using this solution successfully for several
> years. I recently upgraded the action to support the GitHub Actions
> Artifacts API v4, as earlier API versions will be removed after November
> 30th.
> >
> > I hope this helps other projects that face similar CI challenges as
> Pulsar has. Please let me know if you need any help in using a similar
> solution for your Apache project's CI.
> >
> > -Lari
>
>
> Hello Lari,
>
> I shared your message over in NuttX-land, where we've had to cut back
> drastically on CI recently. I'm told that Pulsar's solution isn't
> quite relevant to NuttX's situation, since our issue isn't Docker but
> rather the large number of builds... NuttX is a Real Time Operating
> System (RTOS) for embedded systems that runs on a multitude of
> processor architectures and different boards, so we have a challenge
> of how to get good test coverage when each architecture and board
> requires its own build, without exceeding our CI quota. Nevertheless,
> I shared your message as food for thought and you might see some
> questions from some of our other community members.
>
> Thanks for sharing,
> Nathan
>

Reply via email to