Jun,
1) The reports mentioned in the KIP will need to be built. As a start, I
think we can use a cron-based GitHub Action that produces a markdown
report. Longer term, we can maybe look into some static site generator like
GitHub Pages for hosting weekly reports.
2) In my opinion, any test which
Hi, David,
Thanks for the report.
1. Where could we see the reports (quarantined tests, etc) mentioned in the
KIP?
2. The quarantine process seems to only apply to integration tests. What's
our recommendation for flaky unit tests?
Jun
On Thu, Sep 26, 2024 at 1:34 PM David Arthur wrote:
> If t
If there is no more feedback on this, I'll go ahead and move to a vote.
-David
On Sun, Sep 22, 2024 at 11:04 AM Chia-Ping Tsai wrote:
>
>
> > David Arthur 於 2024年9月22日 晚上10:07 寫道:
> >
> > Q2: Yes, I think we should run the quarantined tests on all CI builds,
> PRs
> > and trunk. We can achieve
> David Arthur 於 2024年9月22日 晚上10:07 寫道:
>
> Q2: Yes, I think we should run the quarantined tests on all CI builds, PRs
> and trunk. We can achieve this with --rerun-tasks. This will let PR authors
> gain feedback about their changes affect on the flaky tests. We could even
> create a PR-specif
Q0: > However, the next description: “ Placing a new test into the
quarantine will be made at the discretion of the PR authors and/or
committers. It should not be compulsory.”
Thanks, this was left over from the first iteration of the design, I'll
remove this. Since the process is automated for ne
hi David
Thanks for all response. I have a couple of questions about the quarantine test
and PR flow
Q0:
I am still a bit confused about “should we put new integration tests into
quarantine manually/automatically?” To quote the KIP: “ Automatically placing a
new test into the quarantine will
Q0/Q1: If we automatically quarantine integration tests that have had
recent flakiness, I worry we will just ignore them. However, the
alternative is to let them stay in the main suite while being flaky which
could start to fail builds.
I think the question comes down to: what is the outcome if we
hi David
I have some questions for the latest KIP.
Q0:
If both git and gradle develocity can be our database to query flaky, maybe we
don’t need the flaky tag? As you described before, that can save the overhead
of adding/removing the tags. Make senses?
Q1:
the main test suite should include
hi José and David
Maybe the unit test should be excluded from the "Flaky Test Management" and
"retry"
1. unit test can NOT have "@Flaky" (if we decided to use annotation in the
flaky management)
2. CI will enable the retry only for integration test. This can be
addressed by adding a new flag "all
José,
By default, unit tests will not be eligible for the automatic new test
quarantine. Like you said, if a unit test fails, it indicates a problem.
Perhaps we could auto-fail a test marked as "flaky" that is not also tagged
as "integration"?
Chia-Ping,
>IMHO, the rule should be "unit test s
> Can modules opt out of this feature? For example, the raft module
doesn't have any integration tests and all of the tests are meant to
be deterministic. It would be dangerous to the protocol's correctness
and the consistency of the cluster metadata to allow contributors to
mark tests as flaky in
Thanks for the proposal David.
Can modules opt out of this feature? For example, the raft module
doesn't have any integration tests and all of the tests are meant to
be deterministic. It would be dangerous to the protocol's correctness
and the consistency of the cluster metadata to allow contribut
Chia-Ping, I think something like that can work. I was also thinking about
extracting the test names during trunk builds using Gradle and storing that
somewhere. I think it's fair to say we can derive this data from Git and
Develocity. We can probably figure out the implementation details later on
> However, this doesn't help with the newly added tests that introduce
flakiness. For this, we need some way
to detect when a new test has been added. Still thinking through this...
Maybe we can use git + gradle develocity to address it.
1) list the files changed recently (for example: git diff -
TengYao,
> These two mechanisms are independent. We could manually remove a tag from
a
test, but at the same time, it might still be quarantined.
I know the above situation might sound weird, but I just want to understand
how it would work.
If we remove a tag from a test, we are signaling that we
hi David
The two-tiered approach is interesting and I have questions similar to
TengYao.
BUT, go back to the usage of quarantine and isolation. It seems to me they
are used to make our CI not be noised by the flaky, right? If so, could we
query the Gradle develocity to get flaky and then make onl
Hi David,
Thanks for the explanation.
I like this two-tiered approach, which gives us more flexibility to handle
flaky tests.
The following is my understanding of how it works; please correct me if I'm
wrong:
If we adopt the two-tiered approach, the test might have two
states.(Isolated by develo
Chia/TengYao/TaiJuWu, I agree that tags are a straightforward approach. In
fact, my initial idea was to use tags as the isolation mechanism.
Let me try to motivate the use of a text file a bit more.
Consider the "new tests" scenario where a developer has added a new
integration test. If we use an
Hi David,
Thank you for KIP.
Could we include percentages for each flaky test in quarantined.txt? This
would help us prioritize which tests to resolve first.
Additionally, I would prefer to add a flaky (JUnit) tag to the source code
so we can focus on these tests during development.
Thanks,
Tai
Hi David,
Thanks for this great KIP.
I really appreciate the goal of this KIP, which aims to stabilize the build
and improve our confidence in CI results.
It addresses a real issue where we've become accustomed to seeing failed
results from CI, and this is definitely not good for the Kafka commun
hi David
The KIP is beautiful and I do love a rule which makes us handle those flaky
seriously.
Regarding the "JUnit Tags", it can bring some benefits to us.
1. we can retry only the tests having "flaky" annotation. Other non-flaky
tests should not be retryable
2. we don't need to worry that "qu
Hello, Kafka community!
Looking at the last 7 days of GitHub, we have 59 out of 64 trunk builds
having flaky tests. Excluding timeouts (a separate issue), only 4 builds
out of the last 7 days have failed due to excess test failures. This is
actually a slight improvement when compared with the last
22 matches
Mail list logo