Re: Next steps with flickering tests

Dan Smith Mon, 02 May 2016 13:42:47 -0700

testMultipleCacheServer *is* annotated as a flaky test. Maybe you aren't
actually excluding anything?


I'm surprised testTomstones is not annotated with flaky test. We have at
least 3 bugs all related to this method that are still open - GEODE-1285,
GEODE-1332, GEODE-1287.

-Dan




On Mon, May 2, 2016 at 11:25 AM, Anthony Baker <[email protected]> wrote:

> I have results from 10 runs of all the tests excluding @FlakyTest.  These
> are the only failures:
>
> ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep
> gemfire
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest >
> testTombstones FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest
> > testParallelPropagationHA FAILED
>
> Anthony
>
> > On Apr 27, 2016, at 7:22 PM, Kirk Lund <[email protected]> wrote:
> >
> > We currently have over 10,000 tests but only about 147 are annotated with
> > FlakyTest. It probably wouldn't cause precheckin to take much longer. My
> > main argument for separating the FlakyTests into their own Jenkins build
> > job is to get the main build job 100% green while we know the FlakyTest
> > build job might "flicker".
> >
> > -Kirk
> >
> >
> > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <[email protected]>
> > wrote:
> >
> >> Depending on the amount of "flaky" tests, this should not increase the
> >> time too much.
> >> I forsee these "flaky" tests to be few and far in between. Over time I
> >> imagine this would be a last resort if we cannot fix the test or even
> >> improve the test harness to have a clean test space for each test.
> >>
> >> --Udo
> >>
> >>
> >> On 27/04/2016 6:42 am, Jens Deppe wrote:
> >>
> >>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
> >>> fair
> >>> bit? I'd prefer to see two separate builds running.
> >>>
> >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <[email protected]> wrote:
> >>>
> >>> I'm in favor of running the FlakyTests together at the end of
> precheckin
> >>>> using forkEvery 1 on them too.
> >>>>
> >>>> What about running two nightly builds? One that runs all the non-flaky
> >>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
> >>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
> >>>> machines that separates FlakyTests out into its own job too, but I'd
> like
> >>>> to see the main nightly build go to 100% green (if that's even
> possible
> >>>> without encounter many more flickering tests).
> >>>>
> >>>> -Kirk
> >>>>
> >>>>
> >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <[email protected]>
> wrote:
> >>>>
> >>>> +1 for separating these out and running them with forkEvery 1.
> >>>>>
> >>>>> I think they should probably still run as part of precheckin and the
> >>>>> nightly builds though. We don't want this to turn into essentially
> >>>>> disabling and ignoring these tests.
> >>>>>
> >>>>> -Dan
> >>>>>
> >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <[email protected]>
> wrote:
> >>>>>
> >>>>>> Also, I don't think there's much value continuing to use the "CI"
> >>>>>>
> >>>>> label.
> >>>>
> >>>>> If
> >>>>>
> >>>>>> a test fails in Jenkins, then run the test to see if it fails
> >>>>>>
> >>>>> consistently.
> >>>>>
> >>>>>> If it doesn't, it's flaky. The developer looking at it should try to
> >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> random
> >>>>>> ports with BindExceptions or has short timeouts with probable GC
> >>>>>>
> >>>>> pause")
> >>>>
> >>>>> and include that info when adding the FlakyTest annotation and
> filing a
> >>>>>> Jira bug with the Flaky label. If the test fails consistently, then
> >>>>>>
> >>>>> file
> >>>>
> >>>>> a
> >>>>>
> >>>>>> Jira bug without the Flaky label.
> >>>>>>
> >>>>>> -Kirk
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <[email protected]>
> wrote:
> >>>>>>
> >>>>>> There are quite a few test classes that have multiple test methods
> >>>>>>>
> >>>>>> which
> >>>>
> >>>>> are annotated with the FlakyTest category.
> >>>>>>>
> >>>>>>> More thoughts:
> >>>>>>>
> >>>>>>> In general, I think that if any given test fails intermittently
> then
> >>>>>>>
> >>>>>> it
> >>>>
> >>>>> is
> >>>>>
> >>>>>> a FlakyTest. A good test should either pass or fail consistently.
> >>>>>>>
> >>>>>> After
> >>>>
> >>>>> annotating a test method with FlakyTest, the developer should then
> add
> >>>>>>>
> >>>>>> the
> >>>>>
> >>>>>> Flaky label to corresponding Jira ticket. What we then do with the
> >>>>>>>
> >>>>>> Jira
> >>>>
> >>>>> tickets (ie, fix them) is probably more important than deciding if a
> >>>>>>>
> >>>>>> test
> >>>>>
> >>>>>> is flaky or not.
> >>>>>>>
> >>>>>>> Rather than try to come up with some flaky process for determining
> if
> >>>>>>>
> >>>>>> a
> >>>>
> >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
> >>>>>>>
> >>>>>> better
> >>>>>
> >>>>>> to have a wiki page that has examples of flakiness and how to fix
> them
> >>>>>>>
> >>>>>> ("if
> >>>>>
> >>>>>> the test has thread sleeps, then switch to using Awaitility and do
> >>>>>>> this...").
> >>>>>>>
> >>>>>>> -Kirk
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <[email protected]
> >
> >>>>>>>
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks Kirk!
> >>>>>>>>
> >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> grep
> >>>>>>>>
> >>>>>>> -v
> >>>>>
> >>>>>> Binary | wc -l | xargs echo "Flake factor:"
> >>>>>>>> Flake factor: 136
> >>>>>>>>
> >>>>>>>> Anthony
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <[email protected]
> >
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> Are we also planning to automate the additional build task
> somehow
> >>>>>>>>>
> >>>>>>>> ?
> >>>>
> >>>>> I'd also suggest creating a wiki page with some stats (like how
> >>>>>>>>>
> >>>>>>>> many
> >>>>
> >>>>> FlakyTests we currently have) and the idea behind this effort so we
> >>>>>>>>>
> >>>>>>>> can
> >>>>>
> >>>>>> keep track and see how it's evolving over time.
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <[email protected]>
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>
> >>>>> After completing GEODE-1233, all currently known flickering tests
> >>>>>>>>>>
> >>>>>>>>> are
> >>>>>
> >>>>>> now
> >>>>>>>>
> >>>>>>>>> annotated with our FlakyTest JUnit Category.
> >>>>>>>>>>
> >>>>>>>>>> In an effort to divide our build up into multiple build
> pipelines
> >>>>>>>>>>
> >>>>>>>>> that
> >>>>>
> >>>>>> are
> >>>>>>>>
> >>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
> >>>>>>>>>>
> >>>>>>>>> from
> >>>>>
> >>>>>> the
> >>>>>>>>
> >>>>>>>>> primary integrationTest and distributedTest tasks. An additional
> >>>>>>>>>>
> >>>>>>>>> build
> >>>>>
> >>>>>> task
> >>>>>>>>
> >>>>>>>>> would then execute all of the FlakyTests separately. This would
> >>>>>>>>>>
> >>>>>>>>> hopefully
> >>>>>>>>
> >>>>>>>>> help us get to a point where we can depend on our primary testing
> >>>>>>>>>>
> >>>>>>>>> tasks
> >>>>>
> >>>>>> staying green 100% of the time. We would then prioritize fixing
> >>>>>>>>>>
> >>>>>>>>> the
> >>>>
> >>>>> FlakyTests and one by one removing the FlakyTest category from
> >>>>>>>>>>
> >>>>>>>>> them.
> >>>>
> >>>>> I would also suggest that we execute the FlakyTests with
> >>>>>>>>>>
> >>>>>>>>> "forkEvery
> >>>>
> >>>>> 1"
> >>>>>
> >>>>>> to
> >>>>>>>>
> >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> >>>>>>>>>>
> >>>>>>>>> would
> >>>>>
> >>>>>> hopefully decrease the chance of a GC pause or test pollution
> >>>>>>>>>>
> >>>>>>>>> causing
> >>>>>
> >>>>>> flickering failures.
> >>>>>>>>>>
> >>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
> >>>>>>>>>>
> >>>>>>>>> that
> >>>>
> >>>>> the
> >>>>>>>>
> >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread
> sleeps
> >>>>>>>>>>
> >>>>>>>>> or
> >>>>>
> >>>>>> nothing that waits for async activity, timeouts or sleeps that are
> >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> random
> >>>>>>>>>>
> >>>>>>>>> ports
> >>>>>>>>
> >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> >>>>>>>>>>
> >>>>>>>>>> Opinions or ideas? Hate it? Love it?
> >>>>>>>>>>
> >>>>>>>>>> -Kirk
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> ~/William
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>
>
>

Re: Next steps with flickering tests

Reply via email to