Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-30 Thread David Arthur
Jun, 1) The reports mentioned in the KIP will need to be built. As a start, I think we can use a cron-based GitHub Action that produces a markdown report. Longer term, we can maybe look into some static site generator like GitHub Pages for hosting weekly reports. 2) In my opinion, any test which

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-30 Thread Jun Rao
Hi, David, Thanks for the report. 1. Where could we see the reports (quarantined tests, etc) mentioned in the KIP? 2. The quarantine process seems to only apply to integration tests. What's our recommendation for flaky unit tests? Jun On Thu, Sep 26, 2024 at 1:34 PM David Arthur wrote: > If t

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-26 Thread David Arthur
If there is no more feedback on this, I'll go ahead and move to a vote. -David On Sun, Sep 22, 2024 at 11:04 AM Chia-Ping Tsai wrote: > > > > David Arthur 於 2024年9月22日 晚上10:07 寫道: > > > > Q2: Yes, I think we should run the quarantined tests on all CI builds, > PRs > > and trunk. We can achieve

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-22 Thread Chia-Ping Tsai
> David Arthur 於 2024年9月22日 晚上10:07 寫道: > > Q2: Yes, I think we should run the quarantined tests on all CI builds, PRs > and trunk. We can achieve this with --rerun-tasks. This will let PR authors > gain feedback about their changes affect on the flaky tests. We could even > create a PR-specif

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-22 Thread David Arthur
Q0: > However, the next description: “ Placing a new test into the quarantine will be made at the discretion of the PR authors and/or committers. It should not be compulsory.” Thanks, this was left over from the first iteration of the design, I'll remove this. Since the process is automated for ne

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-21 Thread Chia-Ping Tsai
hi David Thanks for all response. I have a couple of questions about the quarantine test and PR flow Q0: I am still a bit confused about “should we put new integration tests into quarantine manually/automatically?” To quote the KIP: “ Automatically placing a new test into the quarantine will

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-21 Thread David Arthur
Q0/Q1: If we automatically quarantine integration tests that have had recent flakiness, I worry we will just ignore them. However, the alternative is to let them stay in the main suite while being flaky which could start to fail builds. I think the question comes down to: what is the outcome if we

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-21 Thread Chia-Ping Tsai
hi David I have some questions for the latest KIP. Q0: If both git and gradle develocity can be our database to query flaky, maybe we don’t need the flaky tag? As you described before, that can save the overhead of adding/removing the tags. Make senses? Q1: the main test suite should include

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-21 Thread Chia-Ping Tsai
hi José and David Maybe the unit test should be excluded from the "Flaky Test Management" and "retry" 1. unit test can NOT have "@Flaky" (if we decided to use annotation in the flaky management) 2. CI will enable the retry only for integration test. This can be addressed by adding a new flag "all

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-20 Thread David Arthur
José, By default, unit tests will not be eligible for the automatic new test quarantine. Like you said, if a unit test fails, it indicates a problem. Perhaps we could auto-fail a test marked as "flaky" that is not also tagged as "integration"? Chia-Ping, >IMHO, the rule should be "unit test s

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-20 Thread Chia-Ping Tsai
> Can modules opt out of this feature? For example, the raft module doesn't have any integration tests and all of the tests are meant to be deterministic. It would be dangerous to the protocol's correctness and the consistency of the cluster metadata to allow contributors to mark tests as flaky in

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-20 Thread José Armando García Sancio
Thanks for the proposal David. Can modules opt out of this feature? For example, the raft module doesn't have any integration tests and all of the tests are meant to be deterministic. It would be dangerous to the protocol's correctness and the consistency of the cluster metadata to allow contribut

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-20 Thread David Arthur
Chia-Ping, I think something like that can work. I was also thinking about extracting the test names during trunk builds using Gradle and storing that somewhere. I think it's fair to say we can derive this data from Git and Develocity. We can probably figure out the implementation details later on

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-19 Thread Chia-Ping Tsai
> However, this doesn't help with the newly added tests that introduce flakiness. For this, we need some way to detect when a new test has been added. Still thinking through this... Maybe we can use git + gradle develocity to address it. 1) list the files changed recently (for example: git diff -

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-19 Thread David Arthur
TengYao, > These two mechanisms are independent. We could manually remove a tag from a test, but at the same time, it might still be quarantined. I know the above situation might sound weird, but I just want to understand how it would work. If we remove a tag from a test, we are signaling that we

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-19 Thread Chia-Ping Tsai
hi David The two-tiered approach is interesting and I have questions similar to TengYao. BUT, go back to the usage of quarantine and isolation. It seems to me they are used to make our CI not be noised by the flaky, right? If so, could we query the Gradle develocity to get flaky and then make onl

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-19 Thread TengYao Chi
Hi David, Thanks for the explanation. I like this two-tiered approach, which gives us more flexibility to handle flaky tests. The following is my understanding of how it works; please correct me if I'm wrong: If we adopt the two-tiered approach, the test might have two states.(Isolated by develo

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-19 Thread David Arthur
Chia/TengYao/TaiJuWu, I agree that tags are a straightforward approach. In fact, my initial idea was to use tags as the isolation mechanism. Let me try to motivate the use of a text file a bit more. Consider the "new tests" scenario where a developer has added a new integration test. If we use an

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-18 Thread 吳岱儒
Hi David, Thank you for KIP. Could we include percentages for each flaky test in quarantined.txt? This would help us prioritize which tests to resolve first. Additionally, I would prefer to add a flaky (JUnit) tag to the source code so we can focus on these tests during development. Thanks, Tai

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-18 Thread TengYao Chi
Hi David, Thanks for this great KIP. I really appreciate the goal of this KIP, which aims to stabilize the build and improve our confidence in CI results. It addresses a real issue where we've become accustomed to seeing failed results from CI, and this is definitely not good for the Kafka commun

Re: [DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-18 Thread Chia-Ping Tsai
hi David The KIP is beautiful and I do love a rule which makes us handle those flaky seriously. Regarding the "JUnit Tags", it can bring some benefits to us. 1. we can retry only the tests having "flaky" annotation. Other non-flaky tests should not be retryable 2. we don't need to worry that "qu

[DISCUSS] KIP-1090 Flaky Tests 👻

2024-09-18 Thread David Arthur
Hello, Kafka community! Looking at the last 7 days of GitHub, we have 59 out of 64 trunk builds having flaky tests. Excluding timeouts (a separate issue), only 4 builds out of the last 7 days have failed due to excess test failures. This is actually a slight improvement when compared with the last