Thanks for working on this, David.

best,
Colin

On Thu, Sep 12, 2024, at 11:16, David Arthur wrote:
> A lot has been happening with the GitHub Actions build in the past few
> weeks. I thought I would share some updates.
>
> *Build Statistics*
> Now that we have all PRs builds running the test suite (see note below), we
> can do a better comparison between GH and Jenkins
>
> Github Actions
> Successful trunk builds (1):
> 1h56m 5%
> 1h58m avg
> 2h1m 95%
>
> Github Actions
> Successful PR builds:
> 1h14m 5%
> 1h35m avg
> 1h59m 95%
>
> Jenkins
> Successful trunk builds:
> 1h27m 5%
> 4h7m avg
> 5h36m 95%
>
> Jenkins
> Successful PR builds:
> 1h22m 5%
> 3h48m avg
> 5h35m 95%
>
> It's pretty clear that the GitHub Actions build is significantly more
> stable than Jenkins and actually faster on average despite running on
> slower hardware.
>
> 1) We are seeing timeouts occasionally on GH due to a test getting stuck.
> We have narrowed it down to one test class.
>
> *Enabling GitHub Actions by default*
> In https://github.com/apache/kafka/pull/17105 we turned on the full "CI"
> workflow by default for PRs. This has been running now for a few days and
> so far we are well under the quota limit for GH Action Runner usage.
>
> *Green trunk Builds*
> Most of our trunk commits have had green builds on GH Actions and Jenkins.
> This has been the result of a lot of focused effort on fixing flaky tests,
> which is great to see!
>
> On Jenkins, we are continuing to see very erratic build times presumably
> due to resource contention. On Github, our trunk build times are much more
> consistent (presumably due to better isolation).
>
> *Gradle Build Cache*
> Pull Requests now can take advantage of the Gradle Build Cache. The way
> this works is that trunk will write to a cache managed by GitHub Actions
> and PRs will read from it. In theory, if a PR only changes some code in
> ":streams", none of the ":core" tests will be run (and vica-versa).
>
> Here is an example PR build that cut its testing time by around 1hr
> https://ge.apache.org/s/dj2svkxx2edno/timeline.
>
> In practice, we are still seeing a lot of cache misses since the cache will
> slightly lag behind trunk. Stay tuned for improvements to this...
>
> *Gradle Build Scans*
> We are now able to publish Gradle Build Scans for PRs from public forks.
> This is very exciting as it will allow contributors (not just committers!)
> to gain insights into their builds and have very nice looking test reports.
>
> Another improvement here is that the build scan links will be included in
> the PR "Checks". This is much easier to navigate to than finding it in the
> workflow run.
>
> *De-flaking Integration Tests*
> A new "deflake" action was added to our GH Actions. It can be used to
> repeatedly run a @ClusterTest in the CI environment. I wrote up some
> instructions in a doc on our wiki:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=318606545#FlakyTests-GitHub
> "deflake"Action
>
> *Closing old PRs*
> We have finished KAFKA-15073. Our "stale" workflow will now actually close
> PRs that are inactive for more than 120 days.
>
>
> Cheers,
> David A

Reply via email to