Hi Divij, I think this proposal overall makes sense. My only nit sort of a suggestion is that let's also consider a label called newbie++[1] for flaky tests if we are considering adding newbie as a label. I think some of the flaky tests need familiarity with the codebase or the test setup so as a first time contributor, it might be difficult. newbie++ IMO covers that aspect.
[1] https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22 Let me know what you think. Thanks! Sagar. On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya <divijvaidy...@gmail.com> wrote: > > Please, do it. > We can use specific labels to effectively filter those tickets. > > We already have a label and a way to discover flaky tests. They are tagged > with the label "flaky-test" [1]. There is also a label "newbie" [2] meant > for folks who are new to Apache Kafka code base. > My suggestion is to send a broader email to the community (since many will > miss details in this thread) and call for action for committers to > volunteer as "shepherds" for these tickets. I can send one out once we have > some consensus wrt next steps in this thread. > > > [1] > > https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC > > > [2] https://kafka.apache.org/contributing -> Finding a project to work on > > > Divij Vaidya > > > > On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков <nizhi...@apache.org> > wrote: > > > > > > To kickstart this effort, we can publish a list of such tickets in the > > community and assign one or more committers the role of a «shepherd" for > > each ticket. > > > > Please, do it. > > We can use specific label to effectively filter those tickets. > > > > > 13 нояб. 2023 г., в 15:16, Divij Vaidya <divijvaidy...@gmail.com> > > написал(а): > > > > > > Thanks for bringing this up David. > > > > > > My primary concern revolves around the possibility that the currently > > > disabled tests may remain inactive indefinitely. We currently have > > > unresolved JIRA tickets for flaky tests that have been pending for an > > > extended period. I am inclined to support the idea of disabling these > > tests > > > temporarily and merging changes only when the build is successful, > > provided > > > there is a clear plan for re-enabling them in the future. > > > > > > To address this issue, I propose the following measures: > > > > > > 1\ Foster a supportive environment for new contributors within the > > > community, encouraging them to take on tickets associated with flaky > > tests. > > > This initiative would require individuals familiar with the relevant > code > > > to offer guidance to those undertaking these tasks. Committers should > > > prioritize reviewing and addressing these tickets within their > available > > > bandwidth. To kickstart this effort, we can publish a list of such > > tickets > > > in the community and assign one or more committers the role of a > > "shepherd" > > > for each ticket. > > > > > > 2\ Implement a policy to block minor version releases until the Release > > > Manager (RM) is satisfied that the disabled tests do not result in gaps > > in > > > our testing coverage. The RM may rely on Subject Matter Experts (SMEs) > in > > > the specific code areas to provide assurance before giving the green > > light > > > for a release. > > > > > > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous > > > Integration (CI) system. This goal should encompass projects such as > > > refining our test suite to eliminate flakiness and addressing > > > infrastructure issues if necessary. By publishing this goal, we create > a > > > shared vision for the community in 2024, fostering alignment on our > > > objectives. This alignment will aid in prioritizing tasks for community > > > members and guide reviewers in allocating their bandwidth effectively. > > > > > > -- > > > Divij Vaidya > > > > > > > > > > > > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan > > <jols...@confluent.io.invalid> > > > wrote: > > > > > >> I will say that I have also seen tests that seem to be more flaky > > >> intermittently. It may be ok for some time and suddenly the CI is > > >> overloaded and we see issues. > > >> I have also seen the CI struggling with running out of space recently, > > so I > > >> wonder if we can also try to improve things on that front. > > >> > > >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last > > week. > > >> I'm happy to try to get to green builds, but everyone needs to be on > > board. > > >> > > >> https://issues.apache.org/jira/browse/KAFKA-15529 > > >> https://issues.apache.org/jira/browse/KAFKA-14806 > > >> https://issues.apache.org/jira/browse/KAFKA-14249 > > >> https://issues.apache.org/jira/browse/KAFKA-15798 > > >> https://issues.apache.org/jira/browse/KAFKA-15797 > > >> https://issues.apache.org/jira/browse/KAFKA-15690 > > >> https://issues.apache.org/jira/browse/KAFKA-15699 > > >> https://issues.apache.org/jira/browse/KAFKA-15772 > > >> https://issues.apache.org/jira/browse/KAFKA-15759 > > >> https://issues.apache.org/jira/browse/KAFKA-15760 > > >> https://issues.apache.org/jira/browse/KAFKA-15700 > > >> > > >> I've also seen that kraft transactions tests often flakily see that > the > > >> producer id is not allocated and times out. > > >> I can file a JIRA for that too. > > >> > > >> Hopefully this is a place we can start from. > > >> > > >> Justine > > >> > > >> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma <m...@ismaeljuma.com> > wrote: > > >> > > >>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler <vvcep...@apache.org> > > >> wrote: > > >>> > > >>>> In other words, I’m biased to think that new flakiness indicates > > >>>> non-deterministic bugs more often than it indicates a bad test. > > >>>> > > >>> > > >>> My experience is exactly the opposite. As someone who has tracked > many > > of > > >>> the flaky fixes, the vast majority of the time they are an issue with > > the > > >>> test. > > >>> > > >>> Ismael > > >>> > > >> > > > > >