Thanks for the improvement, David!
The "De-flaking Integration Tests" looks great!
I'll use it next time when handling flaky tests.

Thanks.
Luke

On Fri, Sep 13, 2024 at 8:22 AM Chia-Ping Tsai <chia7...@gmail.com> wrote:

> Thanks for David to bring this great improvement to Kafka CI!!!
>
> Best regards,
> Chia-Ping
>
> Josep Prat <josep.p...@aiven.io.invalid> 於 2024年9月13日 週五 上午2:29寫道:
>
> > Thanks for the great summary David!
> > GH CI looks indeed really good.
> >
> > Best,
> > ------------------
> > Josep Prat
> > Open Source Engineering Director, Aiven
> > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > Aiven Deutschland GmbH
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > Anna Richardson, Kenneth Chen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
> > On Thu, Sep 12, 2024, 20:17 David Arthur <davidart...@apache.org> wrote:
> >
> > > A lot has been happening with the GitHub Actions build in the past few
> > > weeks. I thought I would share some updates.
> > >
> > > *Build Statistics*
> > > Now that we have all PRs builds running the test suite (see note
> below),
> > we
> > > can do a better comparison between GH and Jenkins
> > >
> > > Github Actions
> > > Successful trunk builds (1):
> > > 1h56m 5%
> > > 1h58m avg
> > > 2h1m 95%
> > >
> > > Github Actions
> > > Successful PR builds:
> > > 1h14m 5%
> > > 1h35m avg
> > > 1h59m 95%
> > >
> > > Jenkins
> > > Successful trunk builds:
> > > 1h27m 5%
> > > 4h7m avg
> > > 5h36m 95%
> > >
> > > Jenkins
> > > Successful PR builds:
> > > 1h22m 5%
> > > 3h48m avg
> > > 5h35m 95%
> > >
> > > It's pretty clear that the GitHub Actions build is significantly more
> > > stable than Jenkins and actually faster on average despite running on
> > > slower hardware.
> > >
> > > 1) We are seeing timeouts occasionally on GH due to a test getting
> stuck.
> > > We have narrowed it down to one test class.
> > >
> > > *Enabling GitHub Actions by default*
> > > In https://github.com/apache/kafka/pull/17105 we turned on the full
> "CI"
> > > workflow by default for PRs. This has been running now for a few days
> and
> > > so far we are well under the quota limit for GH Action Runner usage.
> > >
> > > *Green trunk Builds*
> > > Most of our trunk commits have had green builds on GH Actions and
> > Jenkins.
> > > This has been the result of a lot of focused effort on fixing flaky
> > tests,
> > > which is great to see!
> > >
> > > On Jenkins, we are continuing to see very erratic build times
> presumably
> > > due to resource contention. On Github, our trunk build times are much
> > more
> > > consistent (presumably due to better isolation).
> > >
> > > *Gradle Build Cache*
> > > Pull Requests now can take advantage of the Gradle Build Cache. The way
> > > this works is that trunk will write to a cache managed by GitHub
> Actions
> > > and PRs will read from it. In theory, if a PR only changes some code in
> > > ":streams", none of the ":core" tests will be run (and vica-versa).
> > >
> > > Here is an example PR build that cut its testing time by around 1hr
> > > https://ge.apache.org/s/dj2svkxx2edno/timeline.
> > >
> > > In practice, we are still seeing a lot of cache misses since the cache
> > will
> > > slightly lag behind trunk. Stay tuned for improvements to this...
> > >
> > > *Gradle Build Scans*
> > > We are now able to publish Gradle Build Scans for PRs from public
> forks.
> > > This is very exciting as it will allow contributors (not just
> > committers!)
> > > to gain insights into their builds and have very nice looking test
> > reports.
> > >
> > > Another improvement here is that the build scan links will be included
> in
> > > the PR "Checks". This is much easier to navigate to than finding it in
> > the
> > > workflow run.
> > >
> > > *De-flaking Integration Tests*
> > > A new "deflake" action was added to our GH Actions. It can be used to
> > > repeatedly run a @ClusterTest in the CI environment. I wrote up some
> > > instructions in a doc on our wiki:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=318606545#FlakyTests-GitHub
> > > "deflake"Action
> > >
> > > *Closing old PRs*
> > > We have finished KAFKA-15073. Our "stale" workflow will now actually
> > close
> > > PRs that are inactive for more than 120 days.
> > >
> > >
> > > Cheers,
> > > David A
> > >
> >
>

Reply via email to