I have results from 10 runs of all the tests excluding @FlakyTest. These are the only failures:
ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep gemfire com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest > testTombstones FAILED com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationHA FAILED Anthony > On Apr 27, 2016, at 7:22 PM, Kirk Lund <[email protected]> wrote: > > We currently have over 10,000 tests but only about 147 are annotated with > FlakyTest. It probably wouldn't cause precheckin to take much longer. My > main argument for separating the FlakyTests into their own Jenkins build > job is to get the main build job 100% green while we know the FlakyTest > build job might "flicker". > > -Kirk > > > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <[email protected]> > wrote: > >> Depending on the amount of "flaky" tests, this should not increase the >> time too much. >> I forsee these "flaky" tests to be few and far in between. Over time I >> imagine this would be a last resort if we cannot fix the test or even >> improve the test harness to have a clean test space for each test. >> >> --Udo >> >> >> On 27/04/2016 6:42 am, Jens Deppe wrote: >> >>> By running the Flakes with forkEvery 1 won't it extend precheckin by a >>> fair >>> bit? I'd prefer to see two separate builds running. >>> >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <[email protected]> wrote: >>> >>> I'm in favor of running the FlakyTests together at the end of precheckin >>>> using forkEvery 1 on them too. >>>> >>>> What about running two nightly builds? One that runs all the non-flaky >>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly >>>> build that runs only FlakyTests? We can run Jenkins jobs on our local >>>> machines that separates FlakyTests out into its own job too, but I'd like >>>> to see the main nightly build go to 100% green (if that's even possible >>>> without encounter many more flickering tests). >>>> >>>> -Kirk >>>> >>>> >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <[email protected]> wrote: >>>> >>>> +1 for separating these out and running them with forkEvery 1. >>>>> >>>>> I think they should probably still run as part of precheckin and the >>>>> nightly builds though. We don't want this to turn into essentially >>>>> disabling and ignoring these tests. >>>>> >>>>> -Dan >>>>> >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <[email protected]> wrote: >>>>> >>>>>> Also, I don't think there's much value continuing to use the "CI" >>>>>> >>>>> label. >>>> >>>>> If >>>>> >>>>>> a test fails in Jenkins, then run the test to see if it fails >>>>>> >>>>> consistently. >>>>> >>>>>> If it doesn't, it's flaky. The developer looking at it should try to >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or random >>>>>> ports with BindExceptions or has short timeouts with probable GC >>>>>> >>>>> pause") >>>> >>>>> and include that info when adding the FlakyTest annotation and filing a >>>>>> Jira bug with the Flaky label. If the test fails consistently, then >>>>>> >>>>> file >>>> >>>>> a >>>>> >>>>>> Jira bug without the Flaky label. >>>>>> >>>>>> -Kirk >>>>>> >>>>>> >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <[email protected]> wrote: >>>>>> >>>>>> There are quite a few test classes that have multiple test methods >>>>>>> >>>>>> which >>>> >>>>> are annotated with the FlakyTest category. >>>>>>> >>>>>>> More thoughts: >>>>>>> >>>>>>> In general, I think that if any given test fails intermittently then >>>>>>> >>>>>> it >>>> >>>>> is >>>>> >>>>>> a FlakyTest. A good test should either pass or fail consistently. >>>>>>> >>>>>> After >>>> >>>>> annotating a test method with FlakyTest, the developer should then add >>>>>>> >>>>>> the >>>>> >>>>>> Flaky label to corresponding Jira ticket. What we then do with the >>>>>>> >>>>>> Jira >>>> >>>>> tickets (ie, fix them) is probably more important than deciding if a >>>>>>> >>>>>> test >>>>> >>>>>> is flaky or not. >>>>>>> >>>>>>> Rather than try to come up with some flaky process for determining if >>>>>>> >>>>>> a >>>> >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be >>>>>>> >>>>>> better >>>>> >>>>>> to have a wiki page that has examples of flakiness and how to fix them >>>>>>> >>>>>> ("if >>>>> >>>>>> the test has thread sleeps, then switch to using Awaitility and do >>>>>>> this..."). >>>>>>> >>>>>>> -Kirk >>>>>>> >>>>>>> >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <[email protected]> >>>>>>> >>>>>> wrote: >>>>> >>>>>> Thanks Kirk! >>>>>>>> >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep >>>>>>>> >>>>>>> -v >>>>> >>>>>> Binary | wc -l | xargs echo "Flake factor:" >>>>>>>> Flake factor: 136 >>>>>>>> >>>>>>>> Anthony >>>>>>>> >>>>>>>> >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <[email protected]> >>>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> Are we also planning to automate the additional build task somehow >>>>>>>>> >>>>>>>> ? >>>> >>>>> I'd also suggest creating a wiki page with some stats (like how >>>>>>>>> >>>>>>>> many >>>> >>>>> FlakyTests we currently have) and the idea behind this effort so we >>>>>>>>> >>>>>>>> can >>>>> >>>>>> keep track and see how it's evolving over time. >>>>>>>>> >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <[email protected]> >>>>>>>>> >>>>>>>> wrote: >>>> >>>>> After completing GEODE-1233, all currently known flickering tests >>>>>>>>>> >>>>>>>>> are >>>>> >>>>>> now >>>>>>>> >>>>>>>>> annotated with our FlakyTest JUnit Category. >>>>>>>>>> >>>>>>>>>> In an effort to divide our build up into multiple build pipelines >>>>>>>>>> >>>>>>>>> that >>>>> >>>>>> are >>>>>>>> >>>>>>>>> sequential and dependable, we could consider excluding FlakyTests >>>>>>>>>> >>>>>>>>> from >>>>> >>>>>> the >>>>>>>> >>>>>>>>> primary integrationTest and distributedTest tasks. An additional >>>>>>>>>> >>>>>>>>> build >>>>> >>>>>> task >>>>>>>> >>>>>>>>> would then execute all of the FlakyTests separately. This would >>>>>>>>>> >>>>>>>>> hopefully >>>>>>>> >>>>>>>>> help us get to a point where we can depend on our primary testing >>>>>>>>>> >>>>>>>>> tasks >>>>> >>>>>> staying green 100% of the time. We would then prioritize fixing >>>>>>>>>> >>>>>>>>> the >>>> >>>>> FlakyTests and one by one removing the FlakyTest category from >>>>>>>>>> >>>>>>>>> them. >>>> >>>>> I would also suggest that we execute the FlakyTests with >>>>>>>>>> >>>>>>>>> "forkEvery >>>> >>>>> 1" >>>>> >>>>>> to >>>>>>>> >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That >>>>>>>>>> >>>>>>>>> would >>>>> >>>>>> hopefully decrease the chance of a GC pause or test pollution >>>>>>>>>> >>>>>>>>> causing >>>>> >>>>>> flickering failures. >>>>>>>>>> >>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe >>>>>>>>>> >>>>>>>>> that >>>> >>>>> the >>>>>>>> >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps >>>>>>>>>> >>>>>>>>> or >>>>> >>>>>> nothing that waits for async activity, timeouts or sleeps that are >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and random >>>>>>>>>> >>>>>>>>> ports >>>>>>>> >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port). >>>>>>>>>> >>>>>>>>>> Opinions or ideas? Hate it? Love it? >>>>>>>>>> >>>>>>>>>> -Kirk >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> ~/William >>>>>>>>> >>>>>>>> >>>>>>>> >>
signature.asc
Description: Message signed with OpenPGP using GPGMail
