Awesome work guys!
And even more thanks for the detailed report...This troubleshooting summary
will be undoubtedly useful for all our maven projects!

Best,
Flavio
On 30 May 2016 23:47, "Ufuk Celebi" <u...@apache.org> wrote:

> Thanks for the effort, Max and Stephan! Happy to see the green light again.
>
> On Mon, May 30, 2016 at 11:03 PM, Stephan Ewen <se...@apache.org> wrote:
> > Hi all!
> >
> > After a few weeks of terrible build issues, I am happy to announce that
> the
> > build works again properly, and we actually get meaningful CI results.
> >
> > Here is a story in many acts, from builds deep red to bright green joy.
> > Kudos to Max, who did most of this troubleshooting. This evening, Max and
> > me debugged the final issue and got the build back on track.
> >
> > ------------------
> > The Journey
> > ------------------
> >
> > (1) Failsafe Plugin
> >
> > The Maven Failsafe Build Plugin had a critical bug due to which failed
> > tests did not result in a failed build.
> >
> > That is a pretty bad bug for a plugin whose only task is to run tests and
> > fail the build if a test fails.
> >
> > After we recognized that, we upgraded the Failsafe Plugin.
> >
> >
> > (2) Failsafe Plugin Dependency Issues
> >
> > After the upgrade, the Failsafe Plugin behaved differently and did not
> > interoperate with Dependency Shading any more.
> >
> > Because of that, we switched to the Surefire Plugin.
> >
> >
> > (3) Fixing all the issues introduced in the meantime
> >
> > Naturally, a number of test instabilities had been introduced, which
> needed
> > to be fixed.
> >
> >
> > (4) Yarn Tests and Test Scope Refactoring
> >
> > In the meantime, a Pull Request was merged that moved the Yarn Tests to
> the
> > test scope.
> > Because the configuration searched for tests in the "main" scope, no Yarn
> > tests were executed for a while, until the scope was fixed.
> >
> >
> > (5) Yarn Tests and JMX Metrics
> >
> > After the Yarn Tests were re-activated, we saw them fail due to warnings
> > created by the newly introduced metrics code. We could fix that by
> updating
> > the metrics code and temporarily not registering JMX beans for all
> metrics.
> >
> >
> > (6) Yarn / Surefire Deadlock
> >
> > Finally, some Yarn tests failed reliably in Maven (though not in the
> IDE).
> > It turned out that those test a command line interface that interacts
> with
> > the standard input stream.
> >
> > The newly deployed Surefire Plugin uses standard input as well, for
> > communication with forked JVMs. Since Surefire internally locks the
> > standard input stream, the Yarn CLI cannot poll the standard input stream
> > without locking up and stalling the tests.
> >
> > We adjusted the tests and now the build happily builds again.
> >
> > -----------------
> > Conclusions:
> > -----------------
> >
> >   - CI is terribly crucial It took us weeks with the fallout of having a
> > period of unreliably CI.
> >
> >   - Maven could do a better job. A bug as crucial as the one that started
> > our problem should not occur in a test plugin like surefire. Also, the
> > constant change of semantics and dependency scopes is annoying. The
> > semantic changes are subtle, but for a build as complex as Flink, they
> make
> > a difference.
> >
> >   - File-based communication is rarely a good idea. The bug in the
> failsafe
> > plugin was caused by improper file-based communication, and some of our
> > discovered instabilities as well.
> >
> > Greetings,
> > Stephan
> >
> >
> > PS: Some issues and mysteries remain for us to solve: When we allow our
> > metrics subsystem to register JMX beans, we see some tests failing due to
> > spontaneous JVM process kills. Whoever has a pointer there, please ping
> us!
>

Reply via email to