Thanks for the great work! :-) Regards, Chiwan Park
> On May 31, 2016, at 7:47 AM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > > Awesome work guys! > And even more thanks for the detailed report...This troubleshooting summary > will be undoubtedly useful for all our maven projects! > > Best, > Flavio > On 30 May 2016 23:47, "Ufuk Celebi" <u...@apache.org> wrote: > >> Thanks for the effort, Max and Stephan! Happy to see the green light again. >> >> On Mon, May 30, 2016 at 11:03 PM, Stephan Ewen <se...@apache.org> wrote: >>> Hi all! >>> >>> After a few weeks of terrible build issues, I am happy to announce that >> the >>> build works again properly, and we actually get meaningful CI results. >>> >>> Here is a story in many acts, from builds deep red to bright green joy. >>> Kudos to Max, who did most of this troubleshooting. This evening, Max and >>> me debugged the final issue and got the build back on track. >>> >>> ------------------ >>> The Journey >>> ------------------ >>> >>> (1) Failsafe Plugin >>> >>> The Maven Failsafe Build Plugin had a critical bug due to which failed >>> tests did not result in a failed build. >>> >>> That is a pretty bad bug for a plugin whose only task is to run tests and >>> fail the build if a test fails. >>> >>> After we recognized that, we upgraded the Failsafe Plugin. >>> >>> >>> (2) Failsafe Plugin Dependency Issues >>> >>> After the upgrade, the Failsafe Plugin behaved differently and did not >>> interoperate with Dependency Shading any more. >>> >>> Because of that, we switched to the Surefire Plugin. >>> >>> >>> (3) Fixing all the issues introduced in the meantime >>> >>> Naturally, a number of test instabilities had been introduced, which >> needed >>> to be fixed. >>> >>> >>> (4) Yarn Tests and Test Scope Refactoring >>> >>> In the meantime, a Pull Request was merged that moved the Yarn Tests to >> the >>> test scope. >>> Because the configuration searched for tests in the "main" scope, no Yarn >>> tests were executed for a while, until the scope was fixed. >>> >>> >>> (5) Yarn Tests and JMX Metrics >>> >>> After the Yarn Tests were re-activated, we saw them fail due to warnings >>> created by the newly introduced metrics code. We could fix that by >> updating >>> the metrics code and temporarily not registering JMX beans for all >> metrics. >>> >>> >>> (6) Yarn / Surefire Deadlock >>> >>> Finally, some Yarn tests failed reliably in Maven (though not in the >> IDE). >>> It turned out that those test a command line interface that interacts >> with >>> the standard input stream. >>> >>> The newly deployed Surefire Plugin uses standard input as well, for >>> communication with forked JVMs. Since Surefire internally locks the >>> standard input stream, the Yarn CLI cannot poll the standard input stream >>> without locking up and stalling the tests. >>> >>> We adjusted the tests and now the build happily builds again. >>> >>> ----------------- >>> Conclusions: >>> ----------------- >>> >>> - CI is terribly crucial It took us weeks with the fallout of having a >>> period of unreliably CI. >>> >>> - Maven could do a better job. A bug as crucial as the one that started >>> our problem should not occur in a test plugin like surefire. Also, the >>> constant change of semantics and dependency scopes is annoying. The >>> semantic changes are subtle, but for a build as complex as Flink, they >> make >>> a difference. >>> >>> - File-based communication is rarely a good idea. The bug in the >> failsafe >>> plugin was caused by improper file-based communication, and some of our >>> discovered instabilities as well. >>> >>> Greetings, >>> Stephan >>> >>> >>> PS: Some issues and mysteries remain for us to solve: When we allow our >>> metrics subsystem to register JMX beans, we see some tests failing due to >>> spontaneous JVM process kills. Whoever has a pointer there, please ping >> us! >>