Re: [DISCUSS] Should we continue to merge without a green build? No!

Ismael Juma Wed, 22 Nov 2023 00:35:32 -0800

Hi David,

Did you take a look at https://issues.apache.org/jira/browse/KAFKA-12216? I
looked into this option already (yes, there isn't much that we haven't
considered in this space).


Ismael

On Wed, Nov 22, 2023 at 12:24 AM David Jacot <dja...@confluent.io.invalid>
wrote:

> Hi all,
>
> Thanks for the good discussion and all the comments. Overall, it seems that
> we all agree on the bad state of our CI. That's a good first step!
>
> I have talked to a few folks this week about it and it seems that many
> folks (including me) are not comfortable with merging PRs at the moment
> because the results of our builds are so bad. I had 40+ failed tests in one
> of my PRs, all unrelated to my changes. It is really hard to be productive
> with this.
>
> Personally, I really want to move towards requiring a green build to merge
> to trunk because this is a clear and binary signal. I agree that we need to
> stabilize the builds before we could even require this so here is my
> proposal.
>
> 1) We could leverage the `reports.junitXml.mergeReruns` option in gradle.
> From the doc [1]:
>
> > When mergeReruns is enabled, if a test fails but is then retried and
> succeeds, its failures will be recorded as <flakyFailure> instead of
> <failure>, within one <testcase>. This is effectively the reporting
> produced by the surefire plugin of Apache Maven™ when enabling reruns. If
> your CI server understands this format, it will indicate that the test was
> flaky. If it > does not, it will indicate that the test succeeded as it
> will ignore the <flakyFailure> information. If the test does not succeed
> (i.e. it fails for every retry), it will be indicated as having failed
> whether your tool understands this format or not.
> > When mergeReruns is disabled (the default), each execution of a test will
> be listed as a separate test case.
>
> It would not resolve all the faky tests for sure but it would at least
> reduce the noise. I see this as a means to get to green builds faster. I
> played a bit with this setting and I discovered [2]. I was hoping that [3]
> could help to resolve it but I need to confirm.
>
> 2) I suppose that we would still have flaky tests preventing us from
> getting a green build even with the setting in place. For those, I think
> that we need to review them one by one and decide whether we want to fix or
> disable them. This is a short term effort to help us get to green builds.
>
> 3) When we get to a point where we can get green builds consistently, we
> could enforce it.
>
> 4) Flaky tests won't disappear with this. They are just hidden. Therefore,
> we also need a process to review the flaky tests and address them. Here, I
> think that we could leverage the dashboard shared by Ismael. One
> possibility would be to review it regularly and decide for each test
> whether it should be fixed, disabled or even removed.
>
> Please let me know what you think.
>
> Another angle that we could consider is improving the CI infrastructure as
> well. I think that many of those flaky tests are due to overloaded Jenkins
> workers. We should perhaps discuss with the infra team to see whether we
> could do something there.
>
> Best,
> David
>
> [1]
> https://docs.gradle.org/current/userguide/java_testing.html#mergereruns
> [2] https://github.com/gradle/gradle/issues/23324
> [3] https://github.com/apache/kafka/pull/14687
>
>
> On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma <m...@ismaeljuma.com> wrote:
>
> > Hi,
> >
> > We have a dashboard already:
> >
> > [image: image.png]
> >
> >
> >
> https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.sortField=FLAKY
> >
> > On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков <nizhi...@apache.org>
> > wrote:
> >
> >> Hello guys.
> >>
> >> I want to tell you about one more approach to deal with flaky tests.
> >> We adopt this approach in Apache Ignite community, so may be it can be
> >> helpful for Kafka, also.
> >>
> >> TL;DR: Apache Ignite community have a tool that provide a statistic of
> >> tests and can tell if PR introduces new failures.
> >>
> >> Apache Ignite has a many tests.
> >> Latest «Run All» contains around 75k.
> >> Most of test has integration style therefore count of flacky are
> >> significant.
> >>
> >> We build a tool - Team City Bot [1]
> >> That provides a combined statistic of flaky tests [2]
> >>
> >> This tool can compare results of Run All for PR and master.
> >> If all OK one can comment jira ticket with a visa from bot [3]
> >>
> >> Visa is a quality proof of PR for Ignite committers.
> >> And we can sort out most flaky tests and prioritize fixes with the bot
> >> statistic [2]
> >>
> >> TC bot integrated with the Team City only, for now.
> >> But, if Kafka community interested we can try to integrate it with
> >> Jenkins.
> >>
> >> [1] https://github.com/apache/ignite-teamcity-bot
> >> [2]
> https://tcbot2.sbt-ignite-dev.ru/current.html?branch=master&count=10
> >> [3]
> >>
> https://issues.apache.org/jira/browse/IGNITE-19950?focusedCommentId=17767394&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17767394
> >>
> >>
> >>
> >> > 15 нояб. 2023 г., в 09:18, Ismael Juma <m...@ismaeljuma.com>
> написал(а):
> >> >
> >> > To use the pain analogy, people seem to have really good painkillers
> and
> >> > hence they somehow don't feel the pain already. ;)
> >> >
> >> > The reality is that important and high quality tests will get fixed.
> >> Poor
> >> > quality tests (low signal to noise ratio) might not get fixed and
> >> that's ok.
> >> >
> >> > I'm not opposed to marking the tests as release blockers as a starting
> >> > point, but I'm saying it's fine if people triage them and decide they
> >> are
> >> > not blockers. In fact, that has already happened in the past.
> >> >
> >> > Ismael
> >> >
> >> > On Tue, Nov 14, 2023 at 10:02 PM Matthias J. Sax <mj...@apache.org>
> >> wrote:
> >> >
> >> >> I agree on the test gap argument. However, my worry is, if we don't
> >> >> "force the pain", it won't get fixed at all. -- I also know, that we
> >> try
> >> >> to find an working approach for many years...
> >> >>
> >> >> My take is that if we disable a test and file a non-blocking Jira,
> it's
> >> >> basically the same as just deleting the test all together and never
> >> talk
> >> >> about it again. -- I believe, this is not want we aim for, but we aim
> >> >> for good test coverage and a way to get these test fixed?
> >> >>
> >> >> Thus IMHO we need some forcing function (either keep the tests and
> feel
> >> >> the pain on every PR), or disable the test and file a blocker JIRA so
> >> >> the pain surfaces on a release forcing us to do something about it.
> >> >>
> >> >> If there is no forcing function, it basically means we are willing to
> >> >> accept test gaps forever.
> >> >>
> >> >>
> >> >> -Matthias
> >> >>
> >> >> On 11/14/23 9:09 PM, Ismael Juma wrote:
> >> >>> Matthias,
> >> >>>
> >> >>> Flaky tests are worse than useless. I know engineers find it hard to
> >> >>> disable them because of the supposed test gap (I find it hard too),
> >> but
> >> >> the
> >> >>> truth is that the test gap is already there! No-one blocks merging
> >> PRs on
> >> >>> flaky tests, but they do get used to ignoring build failures.
> >> >>>
> >> >>> The current approach has been attempted for nearly a decade and it
> has
> >> >>> never worked. I think we should try something different.
> >> >>>
> >> >>> When it comes to marking flaky tests as release blockers, I don't
> >> think
> >> >>> this should be done as a general rule. We should instead assess on a
> >> case
> >> >>> by case basis, same way we do it for bugs.
> >> >>>
> >> >>> Ismael
> >> >>>
> >> >>> On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax <mj...@apache.org>
> >> >> wrote:
> >> >>>
> >> >>>> Thanks for starting this discussion David! I totally agree to "no"!
> >> >>>>
> >> >>>> I think there is no excuse whatsoever for merging PRs with
> >> compilation
> >> >>>> errors (except an honest mistake for conflicting PRs that got
> merged
> >> >>>> interleaved). -- Every committer must(!) check the Jenkins status
> >> before
> >> >>>> merging to avoid such an issue.
> >> >>>>
> >> >>>> Similar for actual permanently broken tests. If there is no green
> >> build,
> >> >>>> and the same test failed across multiple Jenkins runs, a committer
> >> >>>> should detect this and cannot merge a PR.
> >> >>>>
> >> >>>> Given the current state of the CI pipeline, it seems possible to
> get
> >> >>>> green runs, and thus I support the policy (that we actually always
> >> had)
> >> >>>> to only merge if there is at least one green build. If committers
> got
> >> >>>> sloppy about this, we need to call it out and put a hold on this
> >> >> practice.
> >> >>>>
> >> >>>> (The only exception from the above policy would be a very unstable
> >> >>>> status for which getting a green build is not possible at all, due
> to
> >> >>>> too many flaky tests -- for this case, I would accept to merge even
> >> >>>> there is no green build, but committer need to manually ensure that
> >> >>>> every test did pass in at least one test run. -- We had this in the
> >> >>>> past, but we I don't think we are in such a bad situation right
> now).
> >> >>>>
> >> >>>> About disabling tests: I was never a fan of this, because in my
> >> >>>> experience those tests are not fixed any time soon. Especially,
> >> because
> >> >>>> we do not consider such tickets as release blockers. To me, seeing
> >> tests
> >> >>>> fails on PR build is actually a good forcing function for people to
> >> feel
> >> >>>> the pain, and thus get motivated to make time to fix the tests.
> >> >>>>
> >> >>>> I have to admit that I was a little bit sloppy paying attention to
> >> flaky
> >> >>>> tests recently, and I highly appreciate this effort. Also thanks to
> >> >>>> everyone how actually filed a ticket! IMHO, we should file a ticket
> >> for
> >> >>>> every flaky test, and also keep adding comments each time we see a
> >> test
> >> >>>> fail to be able to track the frequency at which a tests fails, so
> we
> >> can
> >> >>>> fix the most pressing ones first.
> >> >>>>
> >> >>>> Te me, the best forcing function to get test stabilized is to file
> >> >>>> tickets and consider them release blockers. Disabling tests does
> not
> >> >>>> really help much IMHO to tackle the problem (we can of course still
> >> >>>> disable them to get noise out of the system, but it would only
> >> introduce
> >> >>>> testing gaps for the time being and also does not help to figure
> out
> >> how
> >> >>>> often a test fails, so it's not a solution to the problem IMHO)
> >> >>>>
> >> >>>>
> >> >>>> -Matthias
> >> >>>>
> >> >>>> On 11/13/23 11:40 PM, Sagar wrote:
> >> >>>>> Hi Divij,
> >> >>>>>
> >> >>>>> I think this proposal overall makes sense. My only nit sort of a
> >> >>>> suggestion
> >> >>>>> is that let's also consider a label called newbie++[1] for flaky
> >> tests
> >> >> if
> >> >>>>> we are considering adding newbie as a label. I think some of the
> >> flaky
> >> >>>>> tests need familiarity with the codebase or the test setup so as a
> >> >> first
> >> >>>>> time contributor, it might be difficult. newbie++ IMO covers that
> >> >> aspect.
> >> >>>>>
> >> >>>>> [1]
> >> >>>>>
> >> >>>>
> >> >>
> >>
> https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22
> >> >>>>>
> >> >>>>> Let me know what you think.
> >> >>>>>
> >> >>>>> Thanks!
> >> >>>>> Sagar.
> >> >>>>>
> >> >>>>> On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya <
> >> divijvaidy...@gmail.com>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>>>   Please, do it.
> >> >>>>>> We can use specific labels to effectively filter those tickets.
> >> >>>>>>
> >> >>>>>> We already have a label and a way to discover flaky tests. They
> are
> >> >>>> tagged
> >> >>>>>> with the label "flaky-test" [1]. There is also a label "newbie"
> [2]
> >> >>>> meant
> >> >>>>>> for folks who are new to Apache Kafka code base.
> >> >>>>>> My suggestion is to send a broader email to the community (since
> >> many
> >> >>>> will
> >> >>>>>> miss details in this thread) and call for action for committers
> to
> >> >>>>>> volunteer as "shepherds" for these tickets. I can send one out
> >> once we
> >> >>>> have
> >> >>>>>> some consensus wrt next steps in this thread.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> [1]
> >> >>>>>>
> >> >>>>>>
> >> >>>>
> >> >>
> >>
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> [2] https://kafka.apache.org/contributing -> Finding a project
> to
> >> >> work
> >> >>>> on
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Divij Vaidya
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков <
> >> nizhi...@apache.org>
> >> >>>>>> wrote:
> >> >>>>>>
> >> >>>>>>>
> >> >>>>>>>> To kickstart this effort, we can publish a list of such tickets
> >> in
> >> >> the
> >> >>>>>>> community and assign one or more committers the role of a
> >> «shepherd"
> >> >>>> for
> >> >>>>>>> each ticket.
> >> >>>>>>>
> >> >>>>>>> Please, do it.
> >> >>>>>>> We can use specific label to effectively filter those tickets.
> >> >>>>>>>
> >> >>>>>>>> 13 нояб. 2023 г., в 15:16, Divij Vaidya <
> divijvaidy...@gmail.com
> >> >
> >> >>>>>>> написал(а):
> >> >>>>>>>>
> >> >>>>>>>> Thanks for bringing this up David.
> >> >>>>>>>>
> >> >>>>>>>> My primary concern revolves around the possibility that the
> >> >> currently
> >> >>>>>>>> disabled tests may remain inactive indefinitely. We currently
> >> have
> >> >>>>>>>> unresolved JIRA tickets for flaky tests that have been pending
> >> for
> >> >> an
> >> >>>>>>>> extended period. I am inclined to support the idea of disabling
> >> >> these
> >> >>>>>>> tests
> >> >>>>>>>> temporarily and merging changes only when the build is
> >> successful,
> >> >>>>>>> provided
> >> >>>>>>>> there is a clear plan for re-enabling them in the future.
> >> >>>>>>>>
> >> >>>>>>>> To address this issue, I propose the following measures:
> >> >>>>>>>>
> >> >>>>>>>> 1\ Foster a supportive environment for new contributors within
> >> the
> >> >>>>>>>> community, encouraging them to take on tickets associated with
> >> flaky
> >> >>>>>>> tests.
> >> >>>>>>>> This initiative would require individuals familiar with the
> >> relevant
> >> >>>>>> code
> >> >>>>>>>> to offer guidance to those undertaking these tasks. Committers
> >> >> should
> >> >>>>>>>> prioritize reviewing and addressing these tickets within their
> >> >>>>>> available
> >> >>>>>>>> bandwidth. To kickstart this effort, we can publish a list of
> >> such
> >> >>>>>>> tickets
> >> >>>>>>>> in the community and assign one or more committers the role of
> a
> >> >>>>>>> "shepherd"
> >> >>>>>>>> for each ticket.
> >> >>>>>>>>
> >> >>>>>>>> 2\ Implement a policy to block minor version releases until the
> >> >>>> Release
> >> >>>>>>>> Manager (RM) is satisfied that the disabled tests do not result
> >> in
> >> >>>> gaps
> >> >>>>>>> in
> >> >>>>>>>> our testing coverage. The RM may rely on Subject Matter Experts
> >> >> (SMEs)
> >> >>>>>> in
> >> >>>>>>>> the specific code areas to provide assurance before giving the
> >> green
> >> >>>>>>> light
> >> >>>>>>>> for a release.
> >> >>>>>>>>
> >> >>>>>>>> 3\ Set a community-wide goal for 2024 to achieve a stable
> >> Continuous
> >> >>>>>>>> Integration (CI) system. This goal should encompass projects
> >> such as
> >> >>>>>>>> refining our test suite to eliminate flakiness and addressing
> >> >>>>>>>> infrastructure issues if necessary. By publishing this goal, we
> >> >> create
> >> >>>>>> a
> >> >>>>>>>> shared vision for the community in 2024, fostering alignment on
> >> our
> >> >>>>>>>> objectives. This alignment will aid in prioritizing tasks for
> >> >>>> community
> >> >>>>>>>> members and guide reviewers in allocating their bandwidth
> >> >> effectively.
> >> >>>>>>>>
> >> >>>>>>>> --
> >> >>>>>>>> Divij Vaidya
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> >> >>>>>>> <jols...@confluent.io.invalid>
> >> >>>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> I will say that I have also seen tests that seem to be more
> >> flaky
> >> >>>>>>>>> intermittently. It may be ok for some time and suddenly the CI
> >> is
> >> >>>>>>>>> overloaded and we see issues.
> >> >>>>>>>>> I have also seen the CI struggling with running out of space
> >> >>>> recently,
> >> >>>>>>> so I
> >> >>>>>>>>> wonder if we can also try to improve things on that front.
> >> >>>>>>>>>
> >> >>>>>>>>> FWIW, I noticed, filed, or commented on several flaky test
> JIRAs
> >> >> last
> >> >>>>>>> week.
> >> >>>>>>>>> I'm happy to try to get to green builds, but everyone needs to
> >> be
> >> >> on
> >> >>>>>>> board.
> >> >>>>>>>>>
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15529
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-14806
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-14249
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15798
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15797
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15690
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15699
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15772
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15759
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15760
> >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15700
> >> >>>>>>>>>
> >> >>>>>>>>> I've also seen that kraft transactions tests often flakily see
> >> that
> >> >>>>>> the
> >> >>>>>>>>> producer id is not allocated and times out.
> >> >>>>>>>>> I can file a JIRA for that too.
> >> >>>>>>>>>
> >> >>>>>>>>> Hopefully this is a place we can start from.
> >> >>>>>>>>>
> >> >>>>>>>>> Justine
> >> >>>>>>>>>
> >> >>>>>>>>> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma <
> m...@ismaeljuma.com
> >> >
> >> >>>>>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler <
> >> >> vvcep...@apache.org>
> >> >>>>>>>>> wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>>> In other words, I’m biased to think that new flakiness
> >> indicates
> >> >>>>>>>>>>> non-deterministic bugs more often than it indicates a bad
> >> test.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> My experience is exactly the opposite. As someone who has
> >> tracked
> >> >>>>>> many
> >> >>>>>>> of
> >> >>>>>>>>>> the flaky fixes, the vast majority of the time they are an
> >> issue
> >> >>>> with
> >> >>>>>>> the
> >> >>>>>>>>>> test.
> >> >>>>>>>>>>
> >> >>>>>>>>>> Ismael
> >> >>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >>
>

Re: [DISCUSS] Should we continue to merge without a green build? No!

Reply via email to