Re: Failing Test

2016-04-05 Thread Maximilian Michels
Thanks, the actual problem is that the ActorSystem gets shutdown. This breaks the testing code. Should be fixed once https://github.com/apache/flink/pull/1852 is merged. On Tue, Apr 5, 2016 at 12:25 PM, Matthias J. Sax wrote: > Happened again after your fix: > https://travis-ci.org/apache/flink/j

Re: Failing Test

2016-04-05 Thread Matthias J. Sax
Happened again after your fix: https://travis-ci.org/apache/flink/jobs/120620482 -Matthias On 04/01/2016 08:57 PM, Maximilian Michels wrote: > Fixed with the resolution of https://issues.apache.org/jira/browse/FLINK-3689. > > On Fri, Apr 1, 2016 at 12:40 PM, Maximilian Michels wrote: >> Hi Mat

Re: Failing Test

2016-04-02 Thread Matthias J. Sax
Thanks. Just tried is out and it works :) On 04/01/2016 08:57 PM, Maximilian Michels wrote: > Fixed with the resolution of https://issues.apache.org/jira/browse/FLINK-3689. > > On Fri, Apr 1, 2016 at 12:40 PM, Maximilian Michels wrote: >> Hi Matthias, >> >> Thanks for spotting the test failure.

Re: Failing Test

2016-04-01 Thread Maximilian Michels
Fixed with the resolution of https://issues.apache.org/jira/browse/FLINK-3689. On Fri, Apr 1, 2016 at 12:40 PM, Maximilian Michels wrote: > Hi Matthias, > > Thanks for spotting the test failure. It's actually a bug in the code > and not a test problem. Fixing it. > > Cheers, > Max > > On Fri, Apr

Re: Failing Test

2016-04-01 Thread Maximilian Michels
Hi Matthias, Thanks for spotting the test failure. It's actually a bug in the code and not a test problem. Fixing it. Cheers, Max On Fri, Apr 1, 2016 at 9:33 AM, Ufuk Celebi wrote: > Hey Matthias, > > the test has been only recently added with the resource management > refactoring. It's probabl

Re: Failing Test

2016-04-01 Thread Ufuk Celebi
Hey Matthias, the test has been only recently added with the resource management refactoring. It's probably just a too aggressive timeout for Travis. @Max: Did you ever see this fail? – Ufuk On Fri, Apr 1, 2016 at 9:24 AM, Matthias J. Sax wrote: > Anyone seen this before? One-time thing or tes

Re: Failing test

2015-10-06 Thread Till Rohrmann
If there is none yet, then we do. Label it with "test-stability". I think the consensus was also to mark it as critical. Otherwise, just add the log to the JIRA. On Tue, Oct 6, 2015 at 2:57 PM, Matthias J. Sax wrote: > Hi, > > One test just failed on current master: > https://travis-ci.org/apac

Re: Failing Test: KafkaITCase and KafkaProducerITCase

2015-09-07 Thread Stephan Ewen
I have a patch pending that should help with these timeout issues (and null checks)... On Mon, Sep 7, 2015 at 2:41 PM, Matthias J. Sax wrote: > Please lock here: > > https://travis-ci.org/apache/flink/jobs/79086396 > > > Failed tests: > > KafkaITCase>KafkaTestBase.prepare:155 Test setup failed:

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-24 Thread Stephan Ewen
+1 for a "test-stability" label and labeling these issues as "critical" On Mon, Aug 24, 2015 at 6:31 PM, Stephan Ewen wrote: > Pushed a fix for the StateCheckpointedITCase > > On Mon, Aug 24, 2015 at 12:19 PM, Maximilian Michels > wrote: > >> +1 for labeling the JIRAs with "test-stability". >>

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-24 Thread Stephan Ewen
Pushed a fix for the StateCheckpointedITCase On Mon, Aug 24, 2015 at 12:19 PM, Maximilian Michels wrote: > +1 for labeling the JIRAs with "test-stability". > > On Sat, Aug 22, 2015 at 8:21 PM, Márton Balassi > wrote: > > > +1 for Vasia's suggestion > > On Aug 22, 2015 8:07 PM, "Vasiliki Kalavri

Re: [FAILING TEST] RandomSamplerTest

2015-08-24 Thread Maximilian Michels
Hi Matthias, Thanks for reporting. The label test-stability exists now. Cheers, Max On Sun, Aug 23, 2015 at 12:32 PM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote: > Hi, > > because there is (not yet) a label for failing tests, I just report it > over the mailing list again. I also op

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-24 Thread Maximilian Michels
+1 for labeling the JIRAs with "test-stability". On Sat, Aug 22, 2015 at 8:21 PM, Márton Balassi wrote: > +1 for Vasia's suggestion > On Aug 22, 2015 8:07 PM, "Vasiliki Kalavri" > wrote: > > > I just came across 2 more :/ > > I'm also in favor of tracking these with JIRA. How about "test-stabil

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-22 Thread Márton Balassi
+1 for Vasia's suggestion On Aug 22, 2015 8:07 PM, "Vasiliki Kalavri" wrote: > I just came across 2 more :/ > I'm also in favor of tracking these with JIRA. How about "test-stability" > for a label? > > -V. > > On 21 August 2015 at 12:47, Matthias J. Sax > > wrote: > > > I like the idea with the

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-22 Thread Vasiliki Kalavri
I just came across 2 more :/ I'm also in favor of tracking these with JIRA. How about "test-stability" for a label? -V. On 21 August 2015 at 12:47, Matthias J. Sax wrote: > I like the idea with the special label. Otherwise, it will be difficult > to find the correct tickets. > > -Matthias > > O

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-21 Thread Matthias J. Sax
I like the idea with the special label. Otherwise, it will be difficult to find the correct tickets. -Matthias On 08/21/2015 12:15 PM, Till Rohrmann wrote: > I'm also in favor of JIRA, because I fear that nobody will keep the wiki > page in sync. Maybe we can assign a special label for test stabi

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-21 Thread Till Rohrmann
I'm also in favor of JIRA, because I fear that nobody will keep the wiki page in sync. Maybe we can assign a special label for test stability to these JIRA issues. Then we can quickly find all currently instable test cases. On Fri, Aug 21, 2015 at 11:02 AM, Robert Metzger wrote: > I agree that w

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-21 Thread Robert Metzger
I agree that we should look for a solution other than opening a lot of small discussion threads on the mailing list. When I have a test failure, I usually search my gmail inbox to see whether somebody else wrote something about the error already. Creating a JIRA for each failing test might be a be

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-20 Thread Matthias J. Sax
Thanks for the info. Over the weeks I lost track which errors/failing/instable tests are know an which not. Should we start a wiki page or similar to collect know errors? If a test fails on a know error, it can just be ignored. This would avoid "spam" on the mailing list. Any thoughts about this?

Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-20 Thread Robert Metzger
Sachin saw the error as well, as reported here: https://issues.apache.org/jira/browse/FLINK-2468 I also see it from time to time.I have a wip branch where I relaxed the constraints for the test to pass a bit. On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote:

Re: [FAILING TEST] BlobLibraryCacheManagerTest

2015-08-16 Thread Stephan Ewen
Looks like a rare race between the cleanup (two changes) and the test validating both changes. I'll push a fix to make the test more reliable. On Sun, Aug 16, 2015 at 11:04 PM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote: > Hi, > > I hit a failing test in flink-runtime. Not sure if it

Re: Failing test in Gelly

2015-08-10 Thread Stephan Ewen
I think the YARN problem is as before, but with a longer timeout. Before, when after 60 seconds the expected output did not come, the tests aborted. The timeout is now 180 seconds, which is probably so long that the deadlock detector (5 minutes no output) kicks in. In any case, there is something

Re: Failing test in Gelly

2015-08-10 Thread Stephan Ewen
May be an issue with the embedded YARN mini cluster... On Mon, Aug 10, 2015 at 8:37 PM, Stephan Ewen wrote: > I think the YARN problem is as before, but with a longer timeout. > > Before, when after 60 seconds the expected output did not come, the tests > aborted. > The timeout is now 180 second

Re: Failing test in Gelly

2015-08-09 Thread Matthias J. Sax
Not sure about the yarn test... As yarn was instable all the time I just ignored it... -Matthias On 08/09/2015 09:38 PM, Ufuk Celebi wrote: > PS what about the yarn test case... Is that one known (with that trace)? > > On Sunday, August 9, 2015, Ufuk Celebi wrote: > >> There is an issue for th

Re: Failing test in Gelly

2015-08-09 Thread Ufuk Celebi
There is an issue for this from last week. Couldn't look into it last week, will do tomorrow. Thanks for the logs. :) On Sunday, August 9, 2015, Matthias J. Sax wrote: > Wrong link... sorry. > > https://travis-ci.org/mjsax/flink/jobs/74787655 > > > > On 08/09/2015 04:02 PM, Maximilian Michels wr

Re: Failing test in Gelly

2015-08-09 Thread Ufuk Celebi
PS what about the yarn test case... Is that one known (with that trace)? On Sunday, August 9, 2015, Ufuk Celebi wrote: > There is an issue for this from last week. Couldn't look into it last > week, will do tomorrow. Thanks for the logs. :) > > On Sunday, August 9, 2015, Matthias J. Sax > wrote

Re: Failing test in Gelly

2015-08-09 Thread Matthias J. Sax
Wrong link... sorry. https://travis-ci.org/mjsax/flink/jobs/74787655 On 08/09/2015 04:02 PM, Maximilian Michels wrote: > Hi Matthias, > > Is that the correct build URL? I can't spot any failing Gelly tests. The > build appears to be stuck in the YARNSessionFIFOITCase. > > Cheers, > Max > > O

Re: Failing test in Gelly

2015-08-09 Thread Maximilian Michels
Hi Matthias, Is that the correct build URL? I can't spot any failing Gelly tests. The build appears to be stuck in the YARNSessionFIFOITCase. Cheers, Max On Sun, Aug 9, 2015 at 3:37 PM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote: > Hi, > > I got a new failing test in this build (fli

Re: Failing Test again

2015-08-04 Thread Aljoscha Krettek
I've also seen the BufferSpillerTest fail: https://travis-ci.org/apache/flink/jobs/74057503 On Tue, 4 Aug 2015 at 14:10 Robert Metzger wrote: > I've assigned https://issues.apache.org/jira/browse/FLINK-1680 to myself. > Maybe Tachyon 0.7 will fix the issues. > > On Tue, Aug 4, 2015 at 1:57 PM,

Re: Failing Test again

2015-08-04 Thread Robert Metzger
I've assigned https://issues.apache.org/jira/browse/FLINK-1680 to myself. Maybe Tachyon 0.7 will fix the issues. On Tue, Aug 4, 2015 at 1:57 PM, Stephan Ewen wrote: > Yes. > > We should know, though, whether this is a Java 6 bug, or a bug in our > system that just happens to occur only with Java

Re: Failing Test again

2015-08-04 Thread Stephan Ewen
Yes. We should know, though, whether this is a Java 6 bug, or a bug in our system that just happens to occur only with Java 6 (because of different timings in this other engine) On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler < chesnay.schep...@fu-berlin.de> wrote: > Aren't we dropping java 6

Re: Failing Test again

2015-08-04 Thread Chesnay Schepler
Aren't we dropping java 6 support? On 04.08.2015 12:21, Stephan Ewen wrote: The "StateCheckpointedITCase" has not failed so far, which also test these guarantees thoroughly. But we need to first rule out the BarrierBuffer. The problem is that the bug occur only on Java 6 and cannot be reproduce

Re: Failing Test again

2015-08-04 Thread Stephan Ewen
The "StateCheckpointedITCase" has not failed so far, which also test these guarantees thoroughly. But we need to first rule out the BarrierBuffer. The problem is that the bug occur only on Java 6 and cannot be reproduced locally... On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra wrote: > Honestly I

Re: Failing Test again

2015-08-04 Thread Gyula Fóra
Honestly I don't think the partitioned state changes have anything to do with the stability, only the reworked test case, which now test proper exactly-once which was missing before. Stephan Ewen ezt írta (időpont: 2015. aug. 4., K, 12:12): > Yes, the build stability is super serious right now.

Re: Failing Test again

2015-08-04 Thread Stephan Ewen
Yes, the build stability is super serious right now. Here are the problems in question, and what we could do about this: BarrierBuffer: Barrier Buffer tests fail in Java 6 builds. I have not found a way to diagnose that problem, yet, but if we cannot find the issue today,

Re: Failing Test again

2015-08-04 Thread Aljoscha Krettek
I've also seen this fail: https://travis-ci.org/apache/flink/jobs/74025862 in SuccessAfterNetworkBuffersFailureITCase Build seems quite flaky recently. On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax wrote: > Rebased on: > > > https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689e

Re: Failing Test again

2015-08-04 Thread Matthias J. Sax
Rebased on: https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 But if the gap between two rebases is large, it's hard to say what the problem might be... The old parent commit (ie, rebase before last rebase) was https://github.com/mjsax/flink/commit/148395bcd81a93bcb1

Re: Failing Test again

2015-08-03 Thread Aljoscha Krettek
What are the commits that you rebased on? Could you maybe narrow down what caused the regression? On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax wrote: > I only report failing tests after a rebase. ;) > > -Matthias > > On 08/03/2015 11:23 PM, Henry Saputra wrote: > > Thanks for reporting it , Matth

Re: Failing Test again

2015-08-03 Thread Matthias J. Sax
I only report failing tests after a rebase. ;) -Matthias On 08/03/2015 11:23 PM, Henry Saputra wrote: > Thanks for reporting it , Matthias. Will try to run Travis for latest Flink. > > Tachyon test is a bit flaky. Maybe updating to latest release could help. > > - Henry > > On Mon, Aug 3, 2015

Re: Failing Test again

2015-08-03 Thread Henry Saputra
Thanks for reporting it , Matthias. Will try to run Travis for latest Flink. Tachyon test is a bit flaky. Maybe updating to latest release could help. - Henry On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax wrote: > Today, not a single built was successful completely. Please see here: > > Flink

Re: Failing Test

2015-08-03 Thread Stephan Ewen
Seen this a few times as well. May be something with the latest "partitioned state" changes... On Mon, Aug 3, 2015 at 5:48 PM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote: > Hi, > > I just hit a failing test > (https://travis-ci.org/apache/flink/jobs/73899795). It is know or new? > >

Re: Failing Test

2015-07-17 Thread Maximilian Michels
Thanks Matthias for overlooking the issue. Thank you Till for the problem formulation and the suggested steps for solving the synchronization problem. I will look into this as soon as possible. Cheers, Max On Fri, Jul 17, 2015 at 11:18 AM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote:

Re: Failing Test

2015-07-17 Thread Matthias J. Sax
I will open an JIRA for this. It's getting "complicated". On 07/17/2015 11:04 AM, Till Rohrmann wrote: > I think the problem might be related to the way the test is constructed. > The test submits a job to the JM and then tries to poll the accumulators > from the JM. If it does not succeed, then t

Re: Failing Test

2015-07-17 Thread Till Rohrmann
I think the problem might be related to the way the test is constructed. The test submits a job to the JM and then tries to poll the accumulators from the JM. If it does not succeed, then the polling is retried with an decreasing pause in between. Furthermore, the task which updates the accumulator

Re: Failing Test

2015-07-16 Thread Matthias J. Sax
Hi, the test still fails. This time in both runs (Flink Travis and my own Travis) -- only for Java 8 again: https://travis-ci.org/apache/flink/jobs/71314132 https://travis-ci.org/mjsax/flink/jobs/71179608 -Matthias On 07/16/2015 02:28 PM, Matthias J. Sax wrote: > Great! I will. As 4 of 5 runs

Re: Failing Test

2015-07-16 Thread Matthias J. Sax
Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will have an eye on it in future runs. -Matthias On 07/16/2015 02:24 PM, Maximilian Michels wrote: > Hi Matthias, > > I've pushed a fix to the master. The problem should be solved. Please tell > me if your Travis reports an error

Re: Failing Test

2015-07-16 Thread Maximilian Michels
Hi Matthias, I've pushed a fix to the master. The problem should be solved. Please tell me if your Travis reports an error again. My Travis never complained :) Cheers, Max On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels wrote: > Hi Matthias, > > This is indeed a timing issue when checking

Re: Failing Test

2015-07-16 Thread Maximilian Michels
Hi Matthias, This is indeed a timing issue when checking for the results in this test. The new accumulator implementation now continuously reports from the running tasks to the job manager. This was merged yesterday. The assertion that fails there is a bit strict. Actually, I've already integrate

Re: Failing Test

2015-07-16 Thread Ufuk Celebi
Hey, this has been merged yesterday. I guess it's a timing issue when verifying the results. Can you file an issue for this? – Ufuk On 16 Jul 2015, at 11:30, Matthias J. Sax wrote: > Hi, > > I hit another failing test (that is new to me): > >> Results : >> Failed tests: >> AccumulatorLiveIT