Hi David,

Thanks for this great KIP.

I really appreciate the goal of this KIP, which aims to stabilize the build
and improve our confidence in CI results.
It addresses a real issue where we've become accustomed to seeing failed
results from CI, and this is definitely not good for the Kafka community.

I have a question regarding this KIP:
It seems that we need to maintain the `quarantined.txt` files manually, is
that correct?
I'm thinking this could become an issue, especially with the planned
removal of ZK in 4.0, which will undoubtedly bring many changes to our
codebase.
Given that, maintaining the `quarantined.txt` files might become a pain.
It would be nice if we could maintain it programmatically.

Best Regards,
TengYao

Chia-Ping Tsai <chia7...@gmail.com> 於 2024年9月19日 週四 上午3:24寫道:

> hi David
>
> The KIP is beautiful and I do love a rule which makes us handle those flaky
> seriously.
>
> Regarding the "JUnit Tags", it can bring some benefits to us.
>
> 1. we can retry only the tests having "flaky" annotation. Other non-flaky
> tests should not be retryable
> 2. we don't need to worry that "quarantined.txt" having out-of-date test
> names
> 3. we can require the flaky annotation must have jira link. That means the
> PR's author must create the jira link for the new flaky
>
> Also, we can add a gradle task to generate "quarantined.txt" file if needs.
>
> Best,
> Chia-Ping
>
> David Arthur <mum...@gmail.com> 於 2024年9月19日 週四 上午12:02寫道:
>
> > Hello, Kafka community!
> >
> > Looking at the last 7 days of GitHub, we have 59 out of 64 trunk builds
> > having flaky tests. Excluding timeouts (a separate issue), only 4 builds
> > out of the last 7 days have failed due to excess test failures. This is
> > actually a slight improvement when compared with the last 28 days. But
> > still, this is obviously a bad situation to be in.
> >
> > We have previously discussed a few ideas to mitigate the impact that
> flaky
> > tests have on our builds. For PRs, we are actually seeing a lot of
> > successful status checks due to our use of the Develocity test retry
> > feature. However, the blanket use of "testRetry" is a bad practice in
> > my opinion. It makes it far too easy for us to ignore tests that are only
> > occasionally flaky. It also applies to unit tests which should never be
> > flaky.
> >
> > Another problem is that we are naturally introducing flaky tests as new
> > features (and tests) are introduced. Similar to feature development, it
> > takes some time for tests to mature and stabilize -- tests are code,
> after
> > all.
> >
> > I have written down a proposal for tracking and managing our flaky
> tests. I
> > have written this as a KIP even though this is an internal change. I did
> so
> > because I would like us to discuss, debate, and solidify a plan -- and
> > ultimately vote on it. A KIP seemed like a good fit.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1090+Flaky+Test+Management
> >
> > I have back-tested this strategy (as best as I can) to our trunk builds
> > from the last month using data from Develocity (i.e., ge.apache.org). I
> > looked at two scenarios. The first scenario was simply quarantining tests
> > with higher than 1% flaky failures, no test re-runs were considered. The
> > second scenario extends the first by allowing up to 3 total flaky
> failures
> > from non-quarantined tests (tests with less than 1% total flakiness).
> >
> > Total builds: *238*
> > Flaky/Failed builds: *228*
> > Flaky builds scenario 1 (quarantine only): *40*
> > Flaky builds scenario 2 (quarantine + retry): *3*
> >
> > In other words, we can tackle the worst flaky failures with the
> quarantine
> > strategy as described in the KIP and handle the long tail of flaky
> failures
> > with the Develocity retry plugin. If we only had 3 failing trunk builds
> per
> > month to investigate, I'd say we were in pretty good shape :)
> >
> > Let me know what you think!
> >
> > Cheers,
> > David A
> >
>

Reply via email to