Re: [ANNOUNCE] Build Issues Solved

Chiwan Park Tue, 31 May 2016 01:44:07 -0700

Maybe it seems about KNN test case which is merged into yesterday. I’ll look 
into ML test.


Regards,
Chiwan Park

> On May 31, 2016, at 5:38 PM, Ufuk Celebi <[email protected]> wrote:
> 
> Currently, an ML test is reliably failing and occasionally some HA
> tests. Is someone looking into the ML test?
> 
> For HA, I will revert a commit, which might cause the HA
> instabilities. Till is working on a proper fix as far as I know.
> 
> On Tue, May 31, 2016 at 3:50 AM, Chiwan Park <[email protected]> wrote:
>> Thanks for the great work! :-)
>> 
>> Regards,
>> Chiwan Park
>> 
>>> On May 31, 2016, at 7:47 AM, Flavio Pompermaier <[email protected]> 
>>> wrote:
>>> 
>>> Awesome work guys!
>>> And even more thanks for the detailed report...This troubleshooting summary
>>> will be undoubtedly useful for all our maven projects!
>>> 
>>> Best,
>>> Flavio
>>> On 30 May 2016 23:47, "Ufuk Celebi" <[email protected]> wrote:
>>> 
>>>> Thanks for the effort, Max and Stephan! Happy to see the green light again.
>>>> 
>>>> On Mon, May 30, 2016 at 11:03 PM, Stephan Ewen <[email protected]> wrote:
>>>>> Hi all!
>>>>> 
>>>>> After a few weeks of terrible build issues, I am happy to announce that
>>>> the
>>>>> build works again properly, and we actually get meaningful CI results.
>>>>> 
>>>>> Here is a story in many acts, from builds deep red to bright green joy.
>>>>> Kudos to Max, who did most of this troubleshooting. This evening, Max and
>>>>> me debugged the final issue and got the build back on track.
>>>>> 
>>>>> ------------------
>>>>> The Journey
>>>>> ------------------
>>>>> 
>>>>> (1) Failsafe Plugin
>>>>> 
>>>>> The Maven Failsafe Build Plugin had a critical bug due to which failed
>>>>> tests did not result in a failed build.
>>>>> 
>>>>> That is a pretty bad bug for a plugin whose only task is to run tests and
>>>>> fail the build if a test fails.
>>>>> 
>>>>> After we recognized that, we upgraded the Failsafe Plugin.
>>>>> 
>>>>> 
>>>>> (2) Failsafe Plugin Dependency Issues
>>>>> 
>>>>> After the upgrade, the Failsafe Plugin behaved differently and did not
>>>>> interoperate with Dependency Shading any more.
>>>>> 
>>>>> Because of that, we switched to the Surefire Plugin.
>>>>> 
>>>>> 
>>>>> (3) Fixing all the issues introduced in the meantime
>>>>> 
>>>>> Naturally, a number of test instabilities had been introduced, which
>>>> needed
>>>>> to be fixed.
>>>>> 
>>>>> 
>>>>> (4) Yarn Tests and Test Scope Refactoring
>>>>> 
>>>>> In the meantime, a Pull Request was merged that moved the Yarn Tests to
>>>> the
>>>>> test scope.
>>>>> Because the configuration searched for tests in the "main" scope, no Yarn
>>>>> tests were executed for a while, until the scope was fixed.
>>>>> 
>>>>> 
>>>>> (5) Yarn Tests and JMX Metrics
>>>>> 
>>>>> After the Yarn Tests were re-activated, we saw them fail due to warnings
>>>>> created by the newly introduced metrics code. We could fix that by
>>>> updating
>>>>> the metrics code and temporarily not registering JMX beans for all
>>>> metrics.
>>>>> 
>>>>> 
>>>>> (6) Yarn / Surefire Deadlock
>>>>> 
>>>>> Finally, some Yarn tests failed reliably in Maven (though not in the
>>>> IDE).
>>>>> It turned out that those test a command line interface that interacts
>>>> with
>>>>> the standard input stream.
>>>>> 
>>>>> The newly deployed Surefire Plugin uses standard input as well, for
>>>>> communication with forked JVMs. Since Surefire internally locks the
>>>>> standard input stream, the Yarn CLI cannot poll the standard input stream
>>>>> without locking up and stalling the tests.
>>>>> 
>>>>> We adjusted the tests and now the build happily builds again.
>>>>> 
>>>>> -----------------
>>>>> Conclusions:
>>>>> -----------------
>>>>> 
>>>>> - CI is terribly crucial It took us weeks with the fallout of having a
>>>>> period of unreliably CI.
>>>>> 
>>>>> - Maven could do a better job. A bug as crucial as the one that started
>>>>> our problem should not occur in a test plugin like surefire. Also, the
>>>>> constant change of semantics and dependency scopes is annoying. The
>>>>> semantic changes are subtle, but for a build as complex as Flink, they
>>>> make
>>>>> a difference.
>>>>> 
>>>>> - File-based communication is rarely a good idea. The bug in the
>>>> failsafe
>>>>> plugin was caused by improper file-based communication, and some of our
>>>>> discovered instabilities as well.
>>>>> 
>>>>> Greetings,
>>>>> Stephan
>>>>> 
>>>>> 
>>>>> PS: Some issues and mysteries remain for us to solve: When we allow our
>>>>> metrics subsystem to register JMX beans, we see some tests failing due to
>>>>> spontaneous JVM process kills. Whoever has a pointer there, please ping
>>>> us!
>>>> 
>>

Re: [ANNOUNCE] Build Issues Solved

Reply via email to