Hi David, Did you take a look at https://issues.apache.org/jira/browse/KAFKA-12216? I looked into this option already (yes, there isn't much that we haven't considered in this space).
Ismael On Wed, Nov 22, 2023 at 12:24 AM David Jacot <dja...@confluent.io.invalid> wrote: > Hi all, > > Thanks for the good discussion and all the comments. Overall, it seems that > we all agree on the bad state of our CI. That's a good first step! > > I have talked to a few folks this week about it and it seems that many > folks (including me) are not comfortable with merging PRs at the moment > because the results of our builds are so bad. I had 40+ failed tests in one > of my PRs, all unrelated to my changes. It is really hard to be productive > with this. > > Personally, I really want to move towards requiring a green build to merge > to trunk because this is a clear and binary signal. I agree that we need to > stabilize the builds before we could even require this so here is my > proposal. > > 1) We could leverage the `reports.junitXml.mergeReruns` option in gradle. > From the doc [1]: > > > When mergeReruns is enabled, if a test fails but is then retried and > succeeds, its failures will be recorded as <flakyFailure> instead of > <failure>, within one <testcase>. This is effectively the reporting > produced by the surefire plugin of Apache Maven™ when enabling reruns. If > your CI server understands this format, it will indicate that the test was > flaky. If it > does not, it will indicate that the test succeeded as it > will ignore the <flakyFailure> information. If the test does not succeed > (i.e. it fails for every retry), it will be indicated as having failed > whether your tool understands this format or not. > > When mergeReruns is disabled (the default), each execution of a test will > be listed as a separate test case. > > It would not resolve all the faky tests for sure but it would at least > reduce the noise. I see this as a means to get to green builds faster. I > played a bit with this setting and I discovered [2]. I was hoping that [3] > could help to resolve it but I need to confirm. > > 2) I suppose that we would still have flaky tests preventing us from > getting a green build even with the setting in place. For those, I think > that we need to review them one by one and decide whether we want to fix or > disable them. This is a short term effort to help us get to green builds. > > 3) When we get to a point where we can get green builds consistently, we > could enforce it. > > 4) Flaky tests won't disappear with this. They are just hidden. Therefore, > we also need a process to review the flaky tests and address them. Here, I > think that we could leverage the dashboard shared by Ismael. One > possibility would be to review it regularly and decide for each test > whether it should be fixed, disabled or even removed. > > Please let me know what you think. > > Another angle that we could consider is improving the CI infrastructure as > well. I think that many of those flaky tests are due to overloaded Jenkins > workers. We should perhaps discuss with the infra team to see whether we > could do something there. > > Best, > David > > [1] > https://docs.gradle.org/current/userguide/java_testing.html#mergereruns > [2] https://github.com/gradle/gradle/issues/23324 > [3] https://github.com/apache/kafka/pull/14687 > > > On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma <m...@ismaeljuma.com> wrote: > > > Hi, > > > > We have a dashboard already: > > > > [image: image.png] > > > > > > > https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.sortField=FLAKY > > > > On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков <nizhi...@apache.org> > > wrote: > > > >> Hello guys. > >> > >> I want to tell you about one more approach to deal with flaky tests. > >> We adopt this approach in Apache Ignite community, so may be it can be > >> helpful for Kafka, also. > >> > >> TL;DR: Apache Ignite community have a tool that provide a statistic of > >> tests and can tell if PR introduces new failures. > >> > >> Apache Ignite has a many tests. > >> Latest «Run All» contains around 75k. > >> Most of test has integration style therefore count of flacky are > >> significant. > >> > >> We build a tool - Team City Bot [1] > >> That provides a combined statistic of flaky tests [2] > >> > >> This tool can compare results of Run All for PR and master. > >> If all OK one can comment jira ticket with a visa from bot [3] > >> > >> Visa is a quality proof of PR for Ignite committers. > >> And we can sort out most flaky tests and prioritize fixes with the bot > >> statistic [2] > >> > >> TC bot integrated with the Team City only, for now. > >> But, if Kafka community interested we can try to integrate it with > >> Jenkins. > >> > >> [1] https://github.com/apache/ignite-teamcity-bot > >> [2] > https://tcbot2.sbt-ignite-dev.ru/current.html?branch=master&count=10 > >> [3] > >> > https://issues.apache.org/jira/browse/IGNITE-19950?focusedCommentId=17767394&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17767394 > >> > >> > >> > >> > 15 нояб. 2023 г., в 09:18, Ismael Juma <m...@ismaeljuma.com> > написал(а): > >> > > >> > To use the pain analogy, people seem to have really good painkillers > and > >> > hence they somehow don't feel the pain already. ;) > >> > > >> > The reality is that important and high quality tests will get fixed. > >> Poor > >> > quality tests (low signal to noise ratio) might not get fixed and > >> that's ok. > >> > > >> > I'm not opposed to marking the tests as release blockers as a starting > >> > point, but I'm saying it's fine if people triage them and decide they > >> are > >> > not blockers. In fact, that has already happened in the past. > >> > > >> > Ismael > >> > > >> > On Tue, Nov 14, 2023 at 10:02 PM Matthias J. Sax <mj...@apache.org> > >> wrote: > >> > > >> >> I agree on the test gap argument. However, my worry is, if we don't > >> >> "force the pain", it won't get fixed at all. -- I also know, that we > >> try > >> >> to find an working approach for many years... > >> >> > >> >> My take is that if we disable a test and file a non-blocking Jira, > it's > >> >> basically the same as just deleting the test all together and never > >> talk > >> >> about it again. -- I believe, this is not want we aim for, but we aim > >> >> for good test coverage and a way to get these test fixed? > >> >> > >> >> Thus IMHO we need some forcing function (either keep the tests and > feel > >> >> the pain on every PR), or disable the test and file a blocker JIRA so > >> >> the pain surfaces on a release forcing us to do something about it. > >> >> > >> >> If there is no forcing function, it basically means we are willing to > >> >> accept test gaps forever. > >> >> > >> >> > >> >> -Matthias > >> >> > >> >> On 11/14/23 9:09 PM, Ismael Juma wrote: > >> >>> Matthias, > >> >>> > >> >>> Flaky tests are worse than useless. I know engineers find it hard to > >> >>> disable them because of the supposed test gap (I find it hard too), > >> but > >> >> the > >> >>> truth is that the test gap is already there! No-one blocks merging > >> PRs on > >> >>> flaky tests, but they do get used to ignoring build failures. > >> >>> > >> >>> The current approach has been attempted for nearly a decade and it > has > >> >>> never worked. I think we should try something different. > >> >>> > >> >>> When it comes to marking flaky tests as release blockers, I don't > >> think > >> >>> this should be done as a general rule. We should instead assess on a > >> case > >> >>> by case basis, same way we do it for bugs. > >> >>> > >> >>> Ismael > >> >>> > >> >>> On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax <mj...@apache.org> > >> >> wrote: > >> >>> > >> >>>> Thanks for starting this discussion David! I totally agree to "no"! > >> >>>> > >> >>>> I think there is no excuse whatsoever for merging PRs with > >> compilation > >> >>>> errors (except an honest mistake for conflicting PRs that got > merged > >> >>>> interleaved). -- Every committer must(!) check the Jenkins status > >> before > >> >>>> merging to avoid such an issue. > >> >>>> > >> >>>> Similar for actual permanently broken tests. If there is no green > >> build, > >> >>>> and the same test failed across multiple Jenkins runs, a committer > >> >>>> should detect this and cannot merge a PR. > >> >>>> > >> >>>> Given the current state of the CI pipeline, it seems possible to > get > >> >>>> green runs, and thus I support the policy (that we actually always > >> had) > >> >>>> to only merge if there is at least one green build. If committers > got > >> >>>> sloppy about this, we need to call it out and put a hold on this > >> >> practice. > >> >>>> > >> >>>> (The only exception from the above policy would be a very unstable > >> >>>> status for which getting a green build is not possible at all, due > to > >> >>>> too many flaky tests -- for this case, I would accept to merge even > >> >>>> there is no green build, but committer need to manually ensure that > >> >>>> every test did pass in at least one test run. -- We had this in the > >> >>>> past, but we I don't think we are in such a bad situation right > now). > >> >>>> > >> >>>> About disabling tests: I was never a fan of this, because in my > >> >>>> experience those tests are not fixed any time soon. Especially, > >> because > >> >>>> we do not consider such tickets as release blockers. To me, seeing > >> tests > >> >>>> fails on PR build is actually a good forcing function for people to > >> feel > >> >>>> the pain, and thus get motivated to make time to fix the tests. > >> >>>> > >> >>>> I have to admit that I was a little bit sloppy paying attention to > >> flaky > >> >>>> tests recently, and I highly appreciate this effort. Also thanks to > >> >>>> everyone how actually filed a ticket! IMHO, we should file a ticket > >> for > >> >>>> every flaky test, and also keep adding comments each time we see a > >> test > >> >>>> fail to be able to track the frequency at which a tests fails, so > we > >> can > >> >>>> fix the most pressing ones first. > >> >>>> > >> >>>> Te me, the best forcing function to get test stabilized is to file > >> >>>> tickets and consider them release blockers. Disabling tests does > not > >> >>>> really help much IMHO to tackle the problem (we can of course still > >> >>>> disable them to get noise out of the system, but it would only > >> introduce > >> >>>> testing gaps for the time being and also does not help to figure > out > >> how > >> >>>> often a test fails, so it's not a solution to the problem IMHO) > >> >>>> > >> >>>> > >> >>>> -Matthias > >> >>>> > >> >>>> On 11/13/23 11:40 PM, Sagar wrote: > >> >>>>> Hi Divij, > >> >>>>> > >> >>>>> I think this proposal overall makes sense. My only nit sort of a > >> >>>> suggestion > >> >>>>> is that let's also consider a label called newbie++[1] for flaky > >> tests > >> >> if > >> >>>>> we are considering adding newbie as a label. I think some of the > >> flaky > >> >>>>> tests need familiarity with the codebase or the test setup so as a > >> >> first > >> >>>>> time contributor, it might be difficult. newbie++ IMO covers that > >> >> aspect. > >> >>>>> > >> >>>>> [1] > >> >>>>> > >> >>>> > >> >> > >> > https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22 > >> >>>>> > >> >>>>> Let me know what you think. > >> >>>>> > >> >>>>> Thanks! > >> >>>>> Sagar. > >> >>>>> > >> >>>>> On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya < > >> divijvaidy...@gmail.com> > >> >>>>> wrote: > >> >>>>> > >> >>>>>>> Please, do it. > >> >>>>>> We can use specific labels to effectively filter those tickets. > >> >>>>>> > >> >>>>>> We already have a label and a way to discover flaky tests. They > are > >> >>>> tagged > >> >>>>>> with the label "flaky-test" [1]. There is also a label "newbie" > [2] > >> >>>> meant > >> >>>>>> for folks who are new to Apache Kafka code base. > >> >>>>>> My suggestion is to send a broader email to the community (since > >> many > >> >>>> will > >> >>>>>> miss details in this thread) and call for action for committers > to > >> >>>>>> volunteer as "shepherds" for these tickets. I can send one out > >> once we > >> >>>> have > >> >>>>>> some consensus wrt next steps in this thread. > >> >>>>>> > >> >>>>>> > >> >>>>>> [1] > >> >>>>>> > >> >>>>>> > >> >>>> > >> >> > >> > https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC > >> >>>>>> > >> >>>>>> > >> >>>>>> [2] https://kafka.apache.org/contributing -> Finding a project > to > >> >> work > >> >>>> on > >> >>>>>> > >> >>>>>> > >> >>>>>> Divij Vaidya > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков < > >> nizhi...@apache.org> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>>> > >> >>>>>>>> To kickstart this effort, we can publish a list of such tickets > >> in > >> >> the > >> >>>>>>> community and assign one or more committers the role of a > >> «shepherd" > >> >>>> for > >> >>>>>>> each ticket. > >> >>>>>>> > >> >>>>>>> Please, do it. > >> >>>>>>> We can use specific label to effectively filter those tickets. > >> >>>>>>> > >> >>>>>>>> 13 нояб. 2023 г., в 15:16, Divij Vaidya < > divijvaidy...@gmail.com > >> > > >> >>>>>>> написал(а): > >> >>>>>>>> > >> >>>>>>>> Thanks for bringing this up David. > >> >>>>>>>> > >> >>>>>>>> My primary concern revolves around the possibility that the > >> >> currently > >> >>>>>>>> disabled tests may remain inactive indefinitely. We currently > >> have > >> >>>>>>>> unresolved JIRA tickets for flaky tests that have been pending > >> for > >> >> an > >> >>>>>>>> extended period. I am inclined to support the idea of disabling > >> >> these > >> >>>>>>> tests > >> >>>>>>>> temporarily and merging changes only when the build is > >> successful, > >> >>>>>>> provided > >> >>>>>>>> there is a clear plan for re-enabling them in the future. > >> >>>>>>>> > >> >>>>>>>> To address this issue, I propose the following measures: > >> >>>>>>>> > >> >>>>>>>> 1\ Foster a supportive environment for new contributors within > >> the > >> >>>>>>>> community, encouraging them to take on tickets associated with > >> flaky > >> >>>>>>> tests. > >> >>>>>>>> This initiative would require individuals familiar with the > >> relevant > >> >>>>>> code > >> >>>>>>>> to offer guidance to those undertaking these tasks. Committers > >> >> should > >> >>>>>>>> prioritize reviewing and addressing these tickets within their > >> >>>>>> available > >> >>>>>>>> bandwidth. To kickstart this effort, we can publish a list of > >> such > >> >>>>>>> tickets > >> >>>>>>>> in the community and assign one or more committers the role of > a > >> >>>>>>> "shepherd" > >> >>>>>>>> for each ticket. > >> >>>>>>>> > >> >>>>>>>> 2\ Implement a policy to block minor version releases until the > >> >>>> Release > >> >>>>>>>> Manager (RM) is satisfied that the disabled tests do not result > >> in > >> >>>> gaps > >> >>>>>>> in > >> >>>>>>>> our testing coverage. The RM may rely on Subject Matter Experts > >> >> (SMEs) > >> >>>>>> in > >> >>>>>>>> the specific code areas to provide assurance before giving the > >> green > >> >>>>>>> light > >> >>>>>>>> for a release. > >> >>>>>>>> > >> >>>>>>>> 3\ Set a community-wide goal for 2024 to achieve a stable > >> Continuous > >> >>>>>>>> Integration (CI) system. This goal should encompass projects > >> such as > >> >>>>>>>> refining our test suite to eliminate flakiness and addressing > >> >>>>>>>> infrastructure issues if necessary. By publishing this goal, we > >> >> create > >> >>>>>> a > >> >>>>>>>> shared vision for the community in 2024, fostering alignment on > >> our > >> >>>>>>>> objectives. This alignment will aid in prioritizing tasks for > >> >>>> community > >> >>>>>>>> members and guide reviewers in allocating their bandwidth > >> >> effectively. > >> >>>>>>>> > >> >>>>>>>> -- > >> >>>>>>>> Divij Vaidya > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan > >> >>>>>>> <jols...@confluent.io.invalid> > >> >>>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>>> I will say that I have also seen tests that seem to be more > >> flaky > >> >>>>>>>>> intermittently. It may be ok for some time and suddenly the CI > >> is > >> >>>>>>>>> overloaded and we see issues. > >> >>>>>>>>> I have also seen the CI struggling with running out of space > >> >>>> recently, > >> >>>>>>> so I > >> >>>>>>>>> wonder if we can also try to improve things on that front. > >> >>>>>>>>> > >> >>>>>>>>> FWIW, I noticed, filed, or commented on several flaky test > JIRAs > >> >> last > >> >>>>>>> week. > >> >>>>>>>>> I'm happy to try to get to green builds, but everyone needs to > >> be > >> >> on > >> >>>>>>> board. > >> >>>>>>>>> > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15529 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-14806 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-14249 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15798 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15797 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15690 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15699 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15772 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15759 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15760 > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15700 > >> >>>>>>>>> > >> >>>>>>>>> I've also seen that kraft transactions tests often flakily see > >> that > >> >>>>>> the > >> >>>>>>>>> producer id is not allocated and times out. > >> >>>>>>>>> I can file a JIRA for that too. > >> >>>>>>>>> > >> >>>>>>>>> Hopefully this is a place we can start from. > >> >>>>>>>>> > >> >>>>>>>>> Justine > >> >>>>>>>>> > >> >>>>>>>>> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma < > m...@ismaeljuma.com > >> > > >> >>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler < > >> >> vvcep...@apache.org> > >> >>>>>>>>> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>>> In other words, I’m biased to think that new flakiness > >> indicates > >> >>>>>>>>>>> non-deterministic bugs more often than it indicates a bad > >> test. > >> >>>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> My experience is exactly the opposite. As someone who has > >> tracked > >> >>>>>> many > >> >>>>>>> of > >> >>>>>>>>>> the flaky fixes, the vast majority of the time they are an > >> issue > >> >>>> with > >> >>>>>>> the > >> >>>>>>>>>> test. > >> >>>>>>>>>> > >> >>>>>>>>>> Ismael > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > >> >