Re: Ci stability

Colin McCabe Mon, 05 Dec 2022 10:02:15 -0800

FYI, there was a memory leak that affected some of the tests which was fixed 
recently, so hopefully stability will improve a bit. See KAFKA-14433 for 
details.


best,
Colin

On Thu, Nov 24, 2022, at 12:48, John Roesler wrote:
> Hi Dan,
>
> I’m not sure if there’s a consistently used tag, but I’ve gotten good 
> mileage out of just searching for “flaky” or “flaky test” in Jira. 
>
> If you’re thinking about filing a ticket for a specific test failure 
> you’ve seen, I’ve also usually been able to find out whether there’s 
> already a ticket by searching for the test class or method name. 
>
> People seem to typically file tickets with “flaky” in the title and 
> then the test name. 
>
> Thanks again for your interest in improving the situation!
> -John
>
> On Thu, Nov 24, 2022, at 10:08, Dan S wrote:
>> Thanks for the reply John! Is there a jira tag or view or something that
>> can be used to find all the failing tests and maybe even try to fix them
>> (even if fix just means extending a timeout)?
>>
>>
>>
>> On Thu, Nov 24, 2022, 16:03 John Roesler <vvcep...@apache.org> wrote:
>>
>>> Hi Dan,
>>>
>>> Thanks for pointing this out. Flaky tests are a perennial problem. We
>>> knock them out every now and then, but eventually more spring up.
>>>
>>> I’ve had some luck in the past filing Jira tickets for the failing tests
>>> as they pop up in my PRs. Another thing that seems to motivate people is to
>>> open a PR to disable the test in question, as you mention. That can be a
>>> bit aggressive, though, so it wouldn’t be my first suggestion.
>>>
>>> I appreciate you bringing this up. I agree that flaky tests pose a risk to
>>> the project because it makes it harder to know whether a PR breaks things
>>> or not.
>>>
>>> Thanks,
>>> John
>>>
>>> On Thu, Nov 24, 2022, at 02:38, Dan S wrote:
>>> > Hello all,
>>> >
>>> > I've had a pr that has been open for a little over a month (several
>>> > feedback cycles happened), and I've never seen a fully passing build
>>> (tests
>>> > in completely different parts of the codebase seemed to fail, often
>>> > timeouts). A cursory look at open PRs seems to indicate that mine is not
>>> > the only one. I was wondering if there is a place where all the flaky
>>> tests
>>> > are being tracked, and if it makes sense to fix (or at least temporarily
>>> > disable) them so that confidence in new PRs could be increased.
>>> >
>>> > Thanks,
>>> >
>>> > Dan
>>>

Re: Ci stability

Reply via email to