On Tue, Mar 16, 2021 at 3:11 AM Sijie Guo <guosi...@gmail.com> wrote:
> > The prototype has demonstrated about 60% reduction in > resource consumption. > > It is hard to quantify. Merging them into one large workflow can result in > more failures. Re-running those failures can consume resources as well. > Yes, you are right. > > > Isn't it urgent to resolve it? > > I think we are in a stage that gives us breathing room to fix flaky tests > and solve other problems, no? > I don't have access to the ASF infra-users mailing list where the resource consumption problems have been discussed. I guess the problem isn't so bad at the moment since it's not coming to us. Yes, it makes sense to focus on the flaky test problem if the resource consumption isn't a pressing problem. > I don't mean we stop the effort here. I mean we have other enhancements > that we can do to improve the situation. > Once we get into a position where the flakiness is reduced, we can merge > them into one workflow. > +1 Getting tests to pass with a lot of retries comes with a tradeoff. One of the critical issues it causes is that real production issues might pass tests and get masked as test flakiness. This causes regressions. The tolerance of test flakiness results in more flaky tests being added to the code base. Unless we make changes to "flaky test handling", we won't be able to change the course. Makes sense? BR, Lari