Thanks Matthias for overlooking the issue. Thank you Till for the problem formulation and the suggested steps for solving the synchronization problem. I will look into this as soon as possible.
Cheers, Max On Fri, Jul 17, 2015 at 11:18 AM, Matthias J. Sax < [email protected]> wrote: > I will open an JIRA for this. It's getting "complicated". > > On 07/17/2015 11:04 AM, Till Rohrmann wrote: > > I think the problem might be related to the way the test is constructed. > > The test submits a job to the JM and then tries to poll the accumulators > > from the JM. If it does not succeed, then the polling is retried with an > > decreasing pause in between. Furthermore, the task which updates the > > accumulators also sleeps for the same period until it reads the next > > element and updates the accumulators. > > > > Since the test does not use an explicit synchronization but instead > relies > > on sleeps, it will most likely exhibit a flakey behaviour. Sleeps don't > > work reliable enough, especially on Travis, to guarantee a certain thread > > interleaving. I'd recommend introducing explicit synchronization > mechanism > > which control the behaviour of the accumulator producing task and > explicit > > testing messages which indicate that a new accumulator value has arrived > at > > the JM. > > > > Cheers, > > Till > > > > On Thu, Jul 16, 2015 at 11:04 PM, Matthias J. Sax < > > [email protected]> wrote: > > > >> Hi, > >> > >> the test still fails. This time in both runs (Flink Travis and my own > >> Travis) -- only for Java 8 again: > >> > >> https://travis-ci.org/apache/flink/jobs/71314132 > >> https://travis-ci.org/mjsax/flink/jobs/71179608 > >> > >> -Matthias > >> > >> > >> On 07/16/2015 02:28 PM, Matthias J. Sax wrote: > >>> Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will > >>> have an eye on it in future runs. > >>> > >>> -Matthias > >>> > >>> > >>> On 07/16/2015 02:24 PM, Maximilian Michels wrote: > >>>> Hi Matthias, > >>>> > >>>> I've pushed a fix to the master. The problem should be solved. Please > >> tell > >>>> me if your Travis reports an error again. My Travis never complained > :) > >>>> > >>>> Cheers, > >>>> Max > >>>> > >>>> On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels <[email protected]> > >> wrote: > >>>> > >>>>> Hi Matthias, > >>>>> > >>>>> This is indeed a timing issue when checking for the results in this > >> test. > >>>>> The new accumulator implementation now continuously reports from the > >>>>> running tasks to the job manager. This was merged yesterday. > >>>>> > >>>>> The assertion that fails there is a bit strict. Actually, I've > already > >>>>> integrated a retry mechanism that fails only if the assertions don't > >> hold > >>>>> for a configured number of times. > >>>>> > >>>>> I'll commit a fix to the master. Thanks for reporting! > >>>>> > >>>>> Cheers, > >>>>> Max > >>>>> > >>>>> On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi <[email protected]> > wrote: > >>>>> > >>>>>> Hey, > >>>>>> > >>>>>> this has been merged yesterday. I guess it's a timing issue when > >>>>>> verifying the results. Can you file an issue for this? > >>>>>> > >>>>>> – Ufuk > >>>>>> > >>>>>> On 16 Jul 2015, at 11:30, Matthias J. Sax < > >> [email protected]> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> I hit another failing test (that is new to me): > >>>>>>> > >>>>>>>> Results : > >>>>>>>> Failed tests: > >>>>>>>> > >>>>>> > >> > AccumulatorLiveITCase.testProgram:106->access$1100:68->checkFlinkAccumulators:189 > >>>>>> null > >>>>>>> > >>>>>>> > >>>>>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > >> 8.694 > >>>>>> sec <<< FAILURE! - in > >>>>>> org.apache.flink.test.accumulators.AccumulatorLiveITCase > >>>>>>>> > >> testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase) > >>>>>> Time elapsed: 8.021 sec <<< FAILURE! > >>>>>>>> java.lang.AssertionError: null > >>>>>>>> at org.junit.Assert.fail(Assert.java:86) > >>>>>>>> at org.junit.Assert.assertTrue(Assert.java:41) > >>>>>>>> at org.junit.Assert.assertTrue(Assert.java:52) > >>>>>>>> at > >>>>>> > >> > org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189) > >>>>>>>> at > >>>>>> > >> > org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68) > >>>>>>> > >>>>>>> Please see: https://travis-ci.org/mjsax/flink/jobs/71179608 > >>>>>>> > >>>>>>> Does anyone know anything about it? > >>>>>>> > >>>>>>> BTW: Even if this test is in flink-tests, the problem seems not to > be > >>>>>>> related to https://issues.apache.org/jira/browse/FLINK-2032 > because > >>>>>>> accumulators are tested. There are not result files involved (as > fas > >> as > >>>>>>> I can tell). > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >> > > > >
