Thanks for the improvement, David! The "De-flaking Integration Tests" looks great! I'll use it next time when handling flaky tests.
Thanks. Luke On Fri, Sep 13, 2024 at 8:22 AM Chia-Ping Tsai <chia7...@gmail.com> wrote: > Thanks for David to bring this great improvement to Kafka CI!!! > > Best regards, > Chia-Ping > > Josep Prat <josep.p...@aiven.io.invalid> 於 2024年9月13日 週五 上午2:29寫道: > > > Thanks for the great summary David! > > GH CI looks indeed really good. > > > > Best, > > ------------------ > > Josep Prat > > Open Source Engineering Director, Aiven > > josep.p...@aiven.io | +491715557497 | aiven.io > > Aiven Deutschland GmbH > > Alexanderufer 3-7, 10117 Berlin > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > > Anna Richardson, Kenneth Chen > > Amtsgericht Charlottenburg, HRB 209739 B > > > > On Thu, Sep 12, 2024, 20:17 David Arthur <davidart...@apache.org> wrote: > > > > > A lot has been happening with the GitHub Actions build in the past few > > > weeks. I thought I would share some updates. > > > > > > *Build Statistics* > > > Now that we have all PRs builds running the test suite (see note > below), > > we > > > can do a better comparison between GH and Jenkins > > > > > > Github Actions > > > Successful trunk builds (1): > > > 1h56m 5% > > > 1h58m avg > > > 2h1m 95% > > > > > > Github Actions > > > Successful PR builds: > > > 1h14m 5% > > > 1h35m avg > > > 1h59m 95% > > > > > > Jenkins > > > Successful trunk builds: > > > 1h27m 5% > > > 4h7m avg > > > 5h36m 95% > > > > > > Jenkins > > > Successful PR builds: > > > 1h22m 5% > > > 3h48m avg > > > 5h35m 95% > > > > > > It's pretty clear that the GitHub Actions build is significantly more > > > stable than Jenkins and actually faster on average despite running on > > > slower hardware. > > > > > > 1) We are seeing timeouts occasionally on GH due to a test getting > stuck. > > > We have narrowed it down to one test class. > > > > > > *Enabling GitHub Actions by default* > > > In https://github.com/apache/kafka/pull/17105 we turned on the full > "CI" > > > workflow by default for PRs. This has been running now for a few days > and > > > so far we are well under the quota limit for GH Action Runner usage. > > > > > > *Green trunk Builds* > > > Most of our trunk commits have had green builds on GH Actions and > > Jenkins. > > > This has been the result of a lot of focused effort on fixing flaky > > tests, > > > which is great to see! > > > > > > On Jenkins, we are continuing to see very erratic build times > presumably > > > due to resource contention. On Github, our trunk build times are much > > more > > > consistent (presumably due to better isolation). > > > > > > *Gradle Build Cache* > > > Pull Requests now can take advantage of the Gradle Build Cache. The way > > > this works is that trunk will write to a cache managed by GitHub > Actions > > > and PRs will read from it. In theory, if a PR only changes some code in > > > ":streams", none of the ":core" tests will be run (and vica-versa). > > > > > > Here is an example PR build that cut its testing time by around 1hr > > > https://ge.apache.org/s/dj2svkxx2edno/timeline. > > > > > > In practice, we are still seeing a lot of cache misses since the cache > > will > > > slightly lag behind trunk. Stay tuned for improvements to this... > > > > > > *Gradle Build Scans* > > > We are now able to publish Gradle Build Scans for PRs from public > forks. > > > This is very exciting as it will allow contributors (not just > > committers!) > > > to gain insights into their builds and have very nice looking test > > reports. > > > > > > Another improvement here is that the build scan links will be included > in > > > the PR "Checks". This is much easier to navigate to than finding it in > > the > > > workflow run. > > > > > > *De-flaking Integration Tests* > > > A new "deflake" action was added to our GH Actions. It can be used to > > > repeatedly run a @ClusterTest in the CI environment. I wrote up some > > > instructions in a doc on our wiki: > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=318606545#FlakyTests-GitHub > > > "deflake"Action > > > > > > *Closing old PRs* > > > We have finished KAFKA-15073. Our "stale" workflow will now actually > > close > > > PRs that are inactive for more than 120 days. > > > > > > > > > Cheers, > > > David A > > > > > >