A lot has been happening with the GitHub Actions build in the past few weeks. I thought I would share some updates.
*Build Statistics* Now that we have all PRs builds running the test suite (see note below), we can do a better comparison between GH and Jenkins Github Actions Successful trunk builds (1): 1h56m 5% 1h58m avg 2h1m 95% Github Actions Successful PR builds: 1h14m 5% 1h35m avg 1h59m 95% Jenkins Successful trunk builds: 1h27m 5% 4h7m avg 5h36m 95% Jenkins Successful PR builds: 1h22m 5% 3h48m avg 5h35m 95% It's pretty clear that the GitHub Actions build is significantly more stable than Jenkins and actually faster on average despite running on slower hardware. 1) We are seeing timeouts occasionally on GH due to a test getting stuck. We have narrowed it down to one test class. *Enabling GitHub Actions by default* In https://github.com/apache/kafka/pull/17105 we turned on the full "CI" workflow by default for PRs. This has been running now for a few days and so far we are well under the quota limit for GH Action Runner usage. *Green trunk Builds* Most of our trunk commits have had green builds on GH Actions and Jenkins. This has been the result of a lot of focused effort on fixing flaky tests, which is great to see! On Jenkins, we are continuing to see very erratic build times presumably due to resource contention. On Github, our trunk build times are much more consistent (presumably due to better isolation). *Gradle Build Cache* Pull Requests now can take advantage of the Gradle Build Cache. The way this works is that trunk will write to a cache managed by GitHub Actions and PRs will read from it. In theory, if a PR only changes some code in ":streams", none of the ":core" tests will be run (and vica-versa). Here is an example PR build that cut its testing time by around 1hr https://ge.apache.org/s/dj2svkxx2edno/timeline. In practice, we are still seeing a lot of cache misses since the cache will slightly lag behind trunk. Stay tuned for improvements to this... *Gradle Build Scans* We are now able to publish Gradle Build Scans for PRs from public forks. This is very exciting as it will allow contributors (not just committers!) to gain insights into their builds and have very nice looking test reports. Another improvement here is that the build scan links will be included in the PR "Checks". This is much easier to navigate to than finding it in the workflow run. *De-flaking Integration Tests* A new "deflake" action was added to our GH Actions. It can be used to repeatedly run a @ClusterTest in the CI environment. I wrote up some instructions in a doc on our wiki: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=318606545#FlakyTests-GitHub "deflake"Action *Closing old PRs* We have finished KAFKA-15073. Our "stale" workflow will now actually close PRs that are inactive for more than 120 days. Cheers, David A