FYI, there was a memory leak that affected some of the tests which was fixed recently, so hopefully stability will improve a bit. See KAFKA-14433 for details.
best, Colin On Thu, Nov 24, 2022, at 12:48, John Roesler wrote: > Hi Dan, > > I’m not sure if there’s a consistently used tag, but I’ve gotten good > mileage out of just searching for “flaky” or “flaky test” in Jira. > > If you’re thinking about filing a ticket for a specific test failure > you’ve seen, I’ve also usually been able to find out whether there’s > already a ticket by searching for the test class or method name. > > People seem to typically file tickets with “flaky” in the title and > then the test name. > > Thanks again for your interest in improving the situation! > -John > > On Thu, Nov 24, 2022, at 10:08, Dan S wrote: >> Thanks for the reply John! Is there a jira tag or view or something that >> can be used to find all the failing tests and maybe even try to fix them >> (even if fix just means extending a timeout)? >> >> >> >> On Thu, Nov 24, 2022, 16:03 John Roesler <vvcep...@apache.org> wrote: >> >>> Hi Dan, >>> >>> Thanks for pointing this out. Flaky tests are a perennial problem. We >>> knock them out every now and then, but eventually more spring up. >>> >>> I’ve had some luck in the past filing Jira tickets for the failing tests >>> as they pop up in my PRs. Another thing that seems to motivate people is to >>> open a PR to disable the test in question, as you mention. That can be a >>> bit aggressive, though, so it wouldn’t be my first suggestion. >>> >>> I appreciate you bringing this up. I agree that flaky tests pose a risk to >>> the project because it makes it harder to know whether a PR breaks things >>> or not. >>> >>> Thanks, >>> John >>> >>> On Thu, Nov 24, 2022, at 02:38, Dan S wrote: >>> > Hello all, >>> > >>> > I've had a pr that has been open for a little over a month (several >>> > feedback cycles happened), and I've never seen a fully passing build >>> (tests >>> > in completely different parts of the codebase seemed to fail, often >>> > timeouts). A cursory look at open PRs seems to indicate that mine is not >>> > the only one. I was wondering if there is a place where all the flaky >>> tests >>> > are being tracked, and if it makes sense to fix (or at least temporarily >>> > disable) them so that confidence in new PRs could be increased. >>> > >>> > Thanks, >>> > >>> > Dan >>>