A lot has been happening with the GitHub Actions build in the past few
weeks. I thought I would share some updates.

*Build Statistics*
Now that we have all PRs builds running the test suite (see note below), we
can do a better comparison between GH and Jenkins

Github Actions
Successful trunk builds (1):
1h56m 5%
1h58m avg
2h1m 95%

Github Actions
Successful PR builds:
1h14m 5%
1h35m avg
1h59m 95%

Jenkins
Successful trunk builds:
1h27m 5%
4h7m avg
5h36m 95%

Jenkins
Successful PR builds:
1h22m 5%
3h48m avg
5h35m 95%

It's pretty clear that the GitHub Actions build is significantly more
stable than Jenkins and actually faster on average despite running on
slower hardware.

1) We are seeing timeouts occasionally on GH due to a test getting stuck.
We have narrowed it down to one test class.

*Enabling GitHub Actions by default*
In https://github.com/apache/kafka/pull/17105 we turned on the full "CI"
workflow by default for PRs. This has been running now for a few days and
so far we are well under the quota limit for GH Action Runner usage.

*Green trunk Builds*
Most of our trunk commits have had green builds on GH Actions and Jenkins.
This has been the result of a lot of focused effort on fixing flaky tests,
which is great to see!

On Jenkins, we are continuing to see very erratic build times presumably
due to resource contention. On Github, our trunk build times are much more
consistent (presumably due to better isolation).

*Gradle Build Cache*
Pull Requests now can take advantage of the Gradle Build Cache. The way
this works is that trunk will write to a cache managed by GitHub Actions
and PRs will read from it. In theory, if a PR only changes some code in
":streams", none of the ":core" tests will be run (and vica-versa).

Here is an example PR build that cut its testing time by around 1hr
https://ge.apache.org/s/dj2svkxx2edno/timeline.

In practice, we are still seeing a lot of cache misses since the cache will
slightly lag behind trunk. Stay tuned for improvements to this...

*Gradle Build Scans*
We are now able to publish Gradle Build Scans for PRs from public forks.
This is very exciting as it will allow contributors (not just committers!)
to gain insights into their builds and have very nice looking test reports.

Another improvement here is that the build scan links will be included in
the PR "Checks". This is much easier to navigate to than finding it in the
workflow run.

*De-flaking Integration Tests*
A new "deflake" action was added to our GH Actions. It can be used to
repeatedly run a @ClusterTest in the CI environment. I wrote up some
instructions in a doc on our wiki:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=318606545#FlakyTests-GitHub
"deflake"Action

*Closing old PRs*
We have finished KAFKA-15073. Our "stale" workflow will now actually close
PRs that are inactive for more than 120 days.


Cheers,
David A

Reply via email to