testMultipleCacheServer *is* annotated as a flaky test. Maybe you aren't actually excluding anything?
I'm surprised testTomstones is not annotated with flaky test. We have at least 3 bugs all related to this method that are still open - GEODE-1285, GEODE-1332, GEODE-1287. -Dan On Mon, May 2, 2016 at 11:25 AM, Anthony Baker <[email protected]> wrote: > I have results from 10 runs of all the tests excluding @FlakyTest. These > are the only failures: > > ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep > gemfire > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > > testMultipleCacheServer FAILED > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > > testMultipleCacheServer FAILED > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > > testMultipleCacheServer FAILED > com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest > > testTombstones FAILED > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > > testMultipleCacheServer FAILED > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > > testMultipleCacheServer FAILED > com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationHA FAILED > > Anthony > > > On Apr 27, 2016, at 7:22 PM, Kirk Lund <[email protected]> wrote: > > > > We currently have over 10,000 tests but only about 147 are annotated with > > FlakyTest. It probably wouldn't cause precheckin to take much longer. My > > main argument for separating the FlakyTests into their own Jenkins build > > job is to get the main build job 100% green while we know the FlakyTest > > build job might "flicker". > > > > -Kirk > > > > > > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <[email protected]> > > wrote: > > > >> Depending on the amount of "flaky" tests, this should not increase the > >> time too much. > >> I forsee these "flaky" tests to be few and far in between. Over time I > >> imagine this would be a last resort if we cannot fix the test or even > >> improve the test harness to have a clean test space for each test. > >> > >> --Udo > >> > >> > >> On 27/04/2016 6:42 am, Jens Deppe wrote: > >> > >>> By running the Flakes with forkEvery 1 won't it extend precheckin by a > >>> fair > >>> bit? I'd prefer to see two separate builds running. > >>> > >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <[email protected]> wrote: > >>> > >>> I'm in favor of running the FlakyTests together at the end of > precheckin > >>>> using forkEvery 1 on them too. > >>>> > >>>> What about running two nightly builds? One that runs all the non-flaky > >>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly > >>>> build that runs only FlakyTests? We can run Jenkins jobs on our local > >>>> machines that separates FlakyTests out into its own job too, but I'd > like > >>>> to see the main nightly build go to 100% green (if that's even > possible > >>>> without encounter many more flickering tests). > >>>> > >>>> -Kirk > >>>> > >>>> > >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <[email protected]> > wrote: > >>>> > >>>> +1 for separating these out and running them with forkEvery 1. > >>>>> > >>>>> I think they should probably still run as part of precheckin and the > >>>>> nightly builds though. We don't want this to turn into essentially > >>>>> disabling and ignoring these tests. > >>>>> > >>>>> -Dan > >>>>> > >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <[email protected]> > wrote: > >>>>> > >>>>>> Also, I don't think there's much value continuing to use the "CI" > >>>>>> > >>>>> label. > >>>> > >>>>> If > >>>>> > >>>>>> a test fails in Jenkins, then run the test to see if it fails > >>>>>> > >>>>> consistently. > >>>>> > >>>>>> If it doesn't, it's flaky. The developer looking at it should try to > >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or > random > >>>>>> ports with BindExceptions or has short timeouts with probable GC > >>>>>> > >>>>> pause") > >>>> > >>>>> and include that info when adding the FlakyTest annotation and > filing a > >>>>>> Jira bug with the Flaky label. If the test fails consistently, then > >>>>>> > >>>>> file > >>>> > >>>>> a > >>>>> > >>>>>> Jira bug without the Flaky label. > >>>>>> > >>>>>> -Kirk > >>>>>> > >>>>>> > >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <[email protected]> > wrote: > >>>>>> > >>>>>> There are quite a few test classes that have multiple test methods > >>>>>>> > >>>>>> which > >>>> > >>>>> are annotated with the FlakyTest category. > >>>>>>> > >>>>>>> More thoughts: > >>>>>>> > >>>>>>> In general, I think that if any given test fails intermittently > then > >>>>>>> > >>>>>> it > >>>> > >>>>> is > >>>>> > >>>>>> a FlakyTest. A good test should either pass or fail consistently. > >>>>>>> > >>>>>> After > >>>> > >>>>> annotating a test method with FlakyTest, the developer should then > add > >>>>>>> > >>>>>> the > >>>>> > >>>>>> Flaky label to corresponding Jira ticket. What we then do with the > >>>>>>> > >>>>>> Jira > >>>> > >>>>> tickets (ie, fix them) is probably more important than deciding if a > >>>>>>> > >>>>>> test > >>>>> > >>>>>> is flaky or not. > >>>>>>> > >>>>>>> Rather than try to come up with some flaky process for determining > if > >>>>>>> > >>>>>> a > >>>> > >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be > >>>>>>> > >>>>>> better > >>>>> > >>>>>> to have a wiki page that has examples of flakiness and how to fix > them > >>>>>>> > >>>>>> ("if > >>>>> > >>>>>> the test has thread sleeps, then switch to using Awaitility and do > >>>>>>> this..."). > >>>>>>> > >>>>>>> -Kirk > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <[email protected] > > > >>>>>>> > >>>>>> wrote: > >>>>> > >>>>>> Thanks Kirk! > >>>>>>>> > >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | > grep > >>>>>>>> > >>>>>>> -v > >>>>> > >>>>>> Binary | wc -l | xargs echo "Flake factor:" > >>>>>>>> Flake factor: 136 > >>>>>>>> > >>>>>>>> Anthony > >>>>>>>> > >>>>>>>> > >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <[email protected] > > > >>>>>>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> +1 > >>>>>>>>> > >>>>>>>>> Are we also planning to automate the additional build task > somehow > >>>>>>>>> > >>>>>>>> ? > >>>> > >>>>> I'd also suggest creating a wiki page with some stats (like how > >>>>>>>>> > >>>>>>>> many > >>>> > >>>>> FlakyTests we currently have) and the idea behind this effort so we > >>>>>>>>> > >>>>>>>> can > >>>>> > >>>>>> keep track and see how it's evolving over time. > >>>>>>>>> > >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <[email protected]> > >>>>>>>>> > >>>>>>>> wrote: > >>>> > >>>>> After completing GEODE-1233, all currently known flickering tests > >>>>>>>>>> > >>>>>>>>> are > >>>>> > >>>>>> now > >>>>>>>> > >>>>>>>>> annotated with our FlakyTest JUnit Category. > >>>>>>>>>> > >>>>>>>>>> In an effort to divide our build up into multiple build > pipelines > >>>>>>>>>> > >>>>>>>>> that > >>>>> > >>>>>> are > >>>>>>>> > >>>>>>>>> sequential and dependable, we could consider excluding FlakyTests > >>>>>>>>>> > >>>>>>>>> from > >>>>> > >>>>>> the > >>>>>>>> > >>>>>>>>> primary integrationTest and distributedTest tasks. An additional > >>>>>>>>>> > >>>>>>>>> build > >>>>> > >>>>>> task > >>>>>>>> > >>>>>>>>> would then execute all of the FlakyTests separately. This would > >>>>>>>>>> > >>>>>>>>> hopefully > >>>>>>>> > >>>>>>>>> help us get to a point where we can depend on our primary testing > >>>>>>>>>> > >>>>>>>>> tasks > >>>>> > >>>>>> staying green 100% of the time. We would then prioritize fixing > >>>>>>>>>> > >>>>>>>>> the > >>>> > >>>>> FlakyTests and one by one removing the FlakyTest category from > >>>>>>>>>> > >>>>>>>>> them. > >>>> > >>>>> I would also suggest that we execute the FlakyTests with > >>>>>>>>>> > >>>>>>>>> "forkEvery > >>>> > >>>>> 1" > >>>>> > >>>>>> to > >>>>>>>> > >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That > >>>>>>>>>> > >>>>>>>>> would > >>>>> > >>>>>> hopefully decrease the chance of a GC pause or test pollution > >>>>>>>>>> > >>>>>>>>> causing > >>>>> > >>>>>> flickering failures. > >>>>>>>>>> > >>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe > >>>>>>>>>> > >>>>>>>>> that > >>>> > >>>>> the > >>>>>>>> > >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread > sleeps > >>>>>>>>>> > >>>>>>>>> or > >>>>> > >>>>>> nothing that waits for async activity, timeouts or sleeps that are > >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and > random > >>>>>>>>>> > >>>>>>>>> ports > >>>>>>>> > >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port). > >>>>>>>>>> > >>>>>>>>>> Opinions or ideas? Hate it? Love it? > >>>>>>>>>> > >>>>>>>>>> -Kirk > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> > >>>>>>>>> ~/William > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >> > >
