Re: Next steps with flickering tests

Kirk Lund Mon, 02 May 2016 14:23:18 -0700

Looks like those tickets were filed since GEODE-1233 was completed.

-Kirk



On Mon, May 2, 2016 at 1:42 PM, Dan Smith <[email protected]> wrote:

> testMultipleCacheServer *is* annotated as a flaky test. Maybe you aren't
> actually excluding anything?
>
> I'm surprised testTomstones is not annotated with flaky test. We have at
> least 3 bugs all related to this method that are still open - GEODE-1285,
> GEODE-1332, GEODE-1287.
>
> -Dan
>
>
>
>
> On Mon, May 2, 2016 at 11:25 AM, Anthony Baker <[email protected]> wrote:
>
> > I have results from 10 runs of all the tests excluding @FlakyTest.  These
> > are the only failures:
> >
> > ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep
> > gemfire
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest >
> > testTombstones FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> >
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest
> > > testParallelPropagationHA FAILED
> >
> > Anthony
> >
> > > On Apr 27, 2016, at 7:22 PM, Kirk Lund <[email protected]> wrote:
> > >
> > > We currently have over 10,000 tests but only about 147 are annotated
> with
> > > FlakyTest. It probably wouldn't cause precheckin to take much longer.
> My
> > > main argument for separating the FlakyTests into their own Jenkins
> build
> > > job is to get the main build job 100% green while we know the FlakyTest
> > > build job might "flicker".
> > >
> > > -Kirk
> > >
> > >
> > > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <[email protected]>
> > > wrote:
> > >
> > >> Depending on the amount of "flaky" tests, this should not increase the
> > >> time too much.
> > >> I forsee these "flaky" tests to be few and far in between. Over time I
> > >> imagine this would be a last resort if we cannot fix the test or even
> > >> improve the test harness to have a clean test space for each test.
> > >>
> > >> --Udo
> > >>
> > >>
> > >> On 27/04/2016 6:42 am, Jens Deppe wrote:
> > >>
> > >>> By running the Flakes with forkEvery 1 won't it extend precheckin by
> a
> > >>> fair
> > >>> bit? I'd prefer to see two separate builds running.
> > >>>
> > >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <[email protected]>
> wrote:
> > >>>
> > >>> I'm in favor of running the FlakyTests together at the end of
> > precheckin
> > >>>> using forkEvery 1 on them too.
> > >>>>
> > >>>> What about running two nightly builds? One that runs all the
> non-flaky
> > >>>> UnitTests, IntegrationTests and DistributedTests. Plus another
> nightly
> > >>>> build that runs only FlakyTests? We can run Jenkins jobs on our
> local
> > >>>> machines that separates FlakyTests out into its own job too, but I'd
> > like
> > >>>> to see the main nightly build go to 100% green (if that's even
> > possible
> > >>>> without encounter many more flickering tests).
> > >>>>
> > >>>> -Kirk
> > >>>>
> > >>>>
> > >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <[email protected]>
> > wrote:
> > >>>>
> > >>>> +1 for separating these out and running them with forkEvery 1.
> > >>>>>
> > >>>>> I think they should probably still run as part of precheckin and
> the
> > >>>>> nightly builds though. We don't want this to turn into essentially
> > >>>>> disabling and ignoring these tests.
> > >>>>>
> > >>>>> -Dan
> > >>>>>
> > >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <[email protected]>
> > wrote:
> > >>>>>
> > >>>>>> Also, I don't think there's much value continuing to use the "CI"
> > >>>>>>
> > >>>>> label.
> > >>>>
> > >>>>> If
> > >>>>>
> > >>>>>> a test fails in Jenkins, then run the test to see if it fails
> > >>>>>>
> > >>>>> consistently.
> > >>>>>
> > >>>>>> If it doesn't, it's flaky. The developer looking at it should try
> to
> > >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> > random
> > >>>>>> ports with BindExceptions or has short timeouts with probable GC
> > >>>>>>
> > >>>>> pause")
> > >>>>
> > >>>>> and include that info when adding the FlakyTest annotation and
> > filing a
> > >>>>>> Jira bug with the Flaky label. If the test fails consistently,
> then
> > >>>>>>
> > >>>>> file
> > >>>>
> > >>>>> a
> > >>>>>
> > >>>>>> Jira bug without the Flaky label.
> > >>>>>>
> > >>>>>> -Kirk
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <[email protected]>
> > wrote:
> > >>>>>>
> > >>>>>> There are quite a few test classes that have multiple test methods
> > >>>>>>>
> > >>>>>> which
> > >>>>
> > >>>>> are annotated with the FlakyTest category.
> > >>>>>>>
> > >>>>>>> More thoughts:
> > >>>>>>>
> > >>>>>>> In general, I think that if any given test fails intermittently
> > then
> > >>>>>>>
> > >>>>>> it
> > >>>>
> > >>>>> is
> > >>>>>
> > >>>>>> a FlakyTest. A good test should either pass or fail consistently.
> > >>>>>>>
> > >>>>>> After
> > >>>>
> > >>>>> annotating a test method with FlakyTest, the developer should then
> > add
> > >>>>>>>
> > >>>>>> the
> > >>>>>
> > >>>>>> Flaky label to corresponding Jira ticket. What we then do with the
> > >>>>>>>
> > >>>>>> Jira
> > >>>>
> > >>>>> tickets (ie, fix them) is probably more important than deciding if
> a
> > >>>>>>>
> > >>>>>> test
> > >>>>>
> > >>>>>> is flaky or not.
> > >>>>>>>
> > >>>>>>> Rather than try to come up with some flaky process for
> determining
> > if
> > >>>>>>>
> > >>>>>> a
> > >>>>
> > >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would
> be
> > >>>>>>>
> > >>>>>> better
> > >>>>>
> > >>>>>> to have a wiki page that has examples of flakiness and how to fix
> > them
> > >>>>>>>
> > >>>>>> ("if
> > >>>>>
> > >>>>>> the test has thread sleeps, then switch to using Awaitility and do
> > >>>>>>> this...").
> > >>>>>>>
> > >>>>>>> -Kirk
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <
> [email protected]
> > >
> > >>>>>>>
> > >>>>>> wrote:
> > >>>>>
> > >>>>>> Thanks Kirk!
> > >>>>>>>>
> > >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> > grep
> > >>>>>>>>
> > >>>>>>> -v
> > >>>>>
> > >>>>>> Binary | wc -l | xargs echo "Flake factor:"
> > >>>>>>>> Flake factor: 136
> > >>>>>>>>
> > >>>>>>>> Anthony
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <
> [email protected]
> > >
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> +1
> > >>>>>>>>>
> > >>>>>>>>> Are we also planning to automate the additional build task
> > somehow
> > >>>>>>>>>
> > >>>>>>>> ?
> > >>>>
> > >>>>> I'd also suggest creating a wiki page with some stats (like how
> > >>>>>>>>>
> > >>>>>>>> many
> > >>>>
> > >>>>> FlakyTests we currently have) and the idea behind this effort so we
> > >>>>>>>>>
> > >>>>>>>> can
> > >>>>>
> > >>>>>> keep track and see how it's evolving over time.
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <[email protected]>
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>
> > >>>>> After completing GEODE-1233, all currently known flickering tests
> > >>>>>>>>>>
> > >>>>>>>>> are
> > >>>>>
> > >>>>>> now
> > >>>>>>>>
> > >>>>>>>>> annotated with our FlakyTest JUnit Category.
> > >>>>>>>>>>
> > >>>>>>>>>> In an effort to divide our build up into multiple build
> > pipelines
> > >>>>>>>>>>
> > >>>>>>>>> that
> > >>>>>
> > >>>>>> are
> > >>>>>>>>
> > >>>>>>>>> sequential and dependable, we could consider excluding
> FlakyTests
> > >>>>>>>>>>
> > >>>>>>>>> from
> > >>>>>
> > >>>>>> the
> > >>>>>>>>
> > >>>>>>>>> primary integrationTest and distributedTest tasks. An
> additional
> > >>>>>>>>>>
> > >>>>>>>>> build
> > >>>>>
> > >>>>>> task
> > >>>>>>>>
> > >>>>>>>>> would then execute all of the FlakyTests separately. This would
> > >>>>>>>>>>
> > >>>>>>>>> hopefully
> > >>>>>>>>
> > >>>>>>>>> help us get to a point where we can depend on our primary
> testing
> > >>>>>>>>>>
> > >>>>>>>>> tasks
> > >>>>>
> > >>>>>> staying green 100% of the time. We would then prioritize fixing
> > >>>>>>>>>>
> > >>>>>>>>> the
> > >>>>
> > >>>>> FlakyTests and one by one removing the FlakyTest category from
> > >>>>>>>>>>
> > >>>>>>>>> them.
> > >>>>
> > >>>>> I would also suggest that we execute the FlakyTests with
> > >>>>>>>>>>
> > >>>>>>>>> "forkEvery
> > >>>>
> > >>>>> 1"
> > >>>>>
> > >>>>>> to
> > >>>>>>>>
> > >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> > >>>>>>>>>>
> > >>>>>>>>> would
> > >>>>>
> > >>>>>> hopefully decrease the chance of a GC pause or test pollution
> > >>>>>>>>>>
> > >>>>>>>>> causing
> > >>>>>
> > >>>>>> flickering failures.
> > >>>>>>>>>>
> > >>>>>>>>>> Having reviewed lots of test code and failure stacks, I
> believe
> > >>>>>>>>>>
> > >>>>>>>>> that
> > >>>>
> > >>>>> the
> > >>>>>>>>
> > >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread
> > sleeps
> > >>>>>>>>>>
> > >>>>>>>>> or
> > >>>>>
> > >>>>>> nothing that waits for async activity, timeouts or sleeps that are
> > >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> > random
> > >>>>>>>>>>
> > >>>>>>>>> ports
> > >>>>>>>>
> > >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> > >>>>>>>>>>
> > >>>>>>>>>> Opinions or ideas? Hate it? Love it?
> > >>>>>>>>>>
> > >>>>>>>>>> -Kirk
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> ~/William
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>
> >
> >
>

Re: Next steps with flickering tests

Reply via email to