Re: Next steps with flickering tests

Kirk Lund Wed, 27 Apr 2016 19:23:16 -0700

We currently have over 10,000 tests but only about 147 are annotated with
FlakyTest. It probably wouldn't cause precheckin to take much longer. My
main argument for separating the FlakyTests into their own Jenkins build
job is to get the main build job 100% green while we know the FlakyTest
build job might "flicker".


-Kirk


On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <[email protected]>
wrote:

> Depending on the amount of "flaky" tests, this should not increase the
> time too much.
> I forsee these "flaky" tests to be few and far in between. Over time I
> imagine this would be a last resort if we cannot fix the test or even
> improve the test harness to have a clean test space for each test.
>
> --Udo
>
>
> On 27/04/2016 6:42 am, Jens Deppe wrote:
>
>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
>> fair
>> bit? I'd prefer to see two separate builds running.
>>
>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <[email protected]> wrote:
>>
>> I'm in favor of running the FlakyTests together at the end of precheckin
>>> using forkEvery 1 on them too.
>>>
>>> What about running two nightly builds? One that runs all the non-flaky
>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
>>> machines that separates FlakyTests out into its own job too, but I'd like
>>> to see the main nightly build go to 100% green (if that's even possible
>>> without encounter many more flickering tests).
>>>
>>> -Kirk
>>>
>>>
>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <[email protected]> wrote:
>>>
>>> +1 for separating these out and running them with forkEvery 1.
>>>>
>>>> I think they should probably still run as part of precheckin and the
>>>> nightly builds though. We don't want this to turn into essentially
>>>> disabling and ignoring these tests.
>>>>
>>>> -Dan
>>>>
>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <[email protected]> wrote:
>>>>
>>>>> Also, I don't think there's much value continuing to use the "CI"
>>>>>
>>>> label.
>>>
>>>> If
>>>>
>>>>> a test fails in Jenkins, then run the test to see if it fails
>>>>>
>>>> consistently.
>>>>
>>>>> If it doesn't, it's flaky. The developer looking at it should try to
>>>>> determine the cause of it failing (ie, "it uses thread sleeps or random
>>>>> ports with BindExceptions or has short timeouts with probable GC
>>>>>
>>>> pause")
>>>
>>>> and include that info when adding the FlakyTest annotation and filing a
>>>>> Jira bug with the Flaky label. If the test fails consistently, then
>>>>>
>>>> file
>>>
>>>> a
>>>>
>>>>> Jira bug without the Flaky label.
>>>>>
>>>>> -Kirk
>>>>>
>>>>>
>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <[email protected]> wrote:
>>>>>
>>>>> There are quite a few test classes that have multiple test methods
>>>>>>
>>>>> which
>>>
>>>> are annotated with the FlakyTest category.
>>>>>>
>>>>>> More thoughts:
>>>>>>
>>>>>> In general, I think that if any given test fails intermittently then
>>>>>>
>>>>> it
>>>
>>>> is
>>>>
>>>>> a FlakyTest. A good test should either pass or fail consistently.
>>>>>>
>>>>> After
>>>
>>>> annotating a test method with FlakyTest, the developer should then add
>>>>>>
>>>>> the
>>>>
>>>>> Flaky label to corresponding Jira ticket. What we then do with the
>>>>>>
>>>>> Jira
>>>
>>>> tickets (ie, fix them) is probably more important than deciding if a
>>>>>>
>>>>> test
>>>>
>>>>> is flaky or not.
>>>>>>
>>>>>> Rather than try to come up with some flaky process for determining if
>>>>>>
>>>>> a
>>>
>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
>>>>>>
>>>>> better
>>>>
>>>>> to have a wiki page that has examples of flakiness and how to fix them
>>>>>>
>>>>> ("if
>>>>
>>>>> the test has thread sleeps, then switch to using Awaitility and do
>>>>>> this...").
>>>>>>
>>>>>> -Kirk
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <[email protected]>
>>>>>>
>>>>> wrote:
>>>>
>>>>> Thanks Kirk!
>>>>>>>
>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep
>>>>>>>
>>>>>> -v
>>>>
>>>>> Binary | wc -l | xargs echo "Flake factor:"
>>>>>>> Flake factor: 136
>>>>>>>
>>>>>>> Anthony
>>>>>>>
>>>>>>>
>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <[email protected]>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Are we also planning to automate the additional build task somehow
>>>>>>>>
>>>>>>> ?
>>>
>>>> I'd also suggest creating a wiki page with some stats (like how
>>>>>>>>
>>>>>>> many
>>>
>>>> FlakyTests we currently have) and the idea behind this effort so we
>>>>>>>>
>>>>>>> can
>>>>
>>>>> keep track and see how it's evolving over time.
>>>>>>>>
>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <[email protected]>
>>>>>>>>
>>>>>>> wrote:
>>>
>>>> After completing GEODE-1233, all currently known flickering tests
>>>>>>>>>
>>>>>>>> are
>>>>
>>>>> now
>>>>>>>
>>>>>>>> annotated with our FlakyTest JUnit Category.
>>>>>>>>>
>>>>>>>>> In an effort to divide our build up into multiple build pipelines
>>>>>>>>>
>>>>>>>> that
>>>>
>>>>> are
>>>>>>>
>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
>>>>>>>>>
>>>>>>>> from
>>>>
>>>>> the
>>>>>>>
>>>>>>>> primary integrationTest and distributedTest tasks. An additional
>>>>>>>>>
>>>>>>>> build
>>>>
>>>>> task
>>>>>>>
>>>>>>>> would then execute all of the FlakyTests separately. This would
>>>>>>>>>
>>>>>>>> hopefully
>>>>>>>
>>>>>>>> help us get to a point where we can depend on our primary testing
>>>>>>>>>
>>>>>>>> tasks
>>>>
>>>>> staying green 100% of the time. We would then prioritize fixing
>>>>>>>>>
>>>>>>>> the
>>>
>>>> FlakyTests and one by one removing the FlakyTest category from
>>>>>>>>>
>>>>>>>> them.
>>>
>>>> I would also suggest that we execute the FlakyTests with
>>>>>>>>>
>>>>>>>> "forkEvery
>>>
>>>> 1"
>>>>
>>>>> to
>>>>>>>
>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
>>>>>>>>>
>>>>>>>> would
>>>>
>>>>> hopefully decrease the chance of a GC pause or test pollution
>>>>>>>>>
>>>>>>>> causing
>>>>
>>>>> flickering failures.
>>>>>>>>>
>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
>>>>>>>>>
>>>>>>>> that
>>>
>>>> the
>>>>>>>
>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps
>>>>>>>>>
>>>>>>>> or
>>>>
>>>>> nothing that waits for async activity, timeouts or sleeps that are
>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and random
>>>>>>>>>
>>>>>>>> ports
>>>>>>>
>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
>>>>>>>>>
>>>>>>>>> Opinions or ideas? Hate it? Love it?
>>>>>>>>>
>>>>>>>>> -Kirk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> ~/William
>>>>>>>>
>>>>>>>
>>>>>>>
>

Re: Next steps with flickering tests

Reply via email to