I agree with Ron. I think improving the framework with a configurable number of retries on some tests will yield the highest ROI in terms of passing builds.
On Fri, Mar 8, 2019 at 10:48 PM Ron Dagostino <rndg...@gmail.com> wrote: > It's a classic problem: you can't string N things together serially and > expect high reliability. 5,000 tests in a row isn't going to give you a > bunch of 9's. It feels to me that the test frameworks themselves should > support a more robust model -- like a way to tag a test as "retry me up to > N times before you really consider me a failure" or something like that. > > Ron > > On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski < > stanis...@confluent.io> > wrote: > > > > We internally have an improvement for a half a year now which reruns > the > > flaky test classes at the end of the test gradle task, lets you know that > > they were rerun and probably flaky. It fails the build only if the second > > run of the test class was also unsuccessful. I think it works pretty > good, > > we mostly have green builds. If there is interest, I can try to > contribute > > that. > > > > That does sound very intriguing. Does it rerun the test classes that > failed > > or some known, marked classes? If it is the former, I can see a lot of > > value in having that automated in our PR builds. I wonder what others > think > > of this > > > > On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass < > > viktorsomo...@gmail.com> > > wrote: > > > > > Hey All, > > > > > > Thanks for the loads of ideas. > > > > > > @Stanislav, @Sonke > > > I probably left it out from my email but I really imagined this as a > > > case-by-case basis change. If we think that it wouldn't cause problems, > > > then it might be applied. That way we'd limit the blast radius > somewhat. > > > The 1 hour gain is really just the most optimistic scenario, I'm almost > > > sure that not every test could be transformed to use a common cluster. > > > We internally have an improvement for a half a year now which reruns > the > > > flaky test classes at the end of the test gradle task, lets you know > that > > > they were rerun and probably flaky. It fails the build only if the > second > > > run of the test class was also unsuccessful. I think it works pretty > > good, > > > we mostly have green builds. If there is interest, I can try to > > contribute > > > that. > > > > > > >I am also extremely annoyed at times by the amount of coffee I have to > > > drink before tests finish > > > Just please don't get a heart attack :) > > > > > > @Ron, @Colin > > > You bring up a very good point that it is easier and frees up more > > > resources if we just run change specific tests and it's good to know > > that a > > > similar solution (meaning using a shared resource for testing) have > > failed > > > elsewhere. I second Ron on the test categorization though, although as > a > > > first attempt I think using a flaky retry + running only the necessary > > > tests would help in both time saving and effectiveness. Also it would > be > > > easier to achieve. > > > > > > @Ismael > > > Yea, it'd be interesting to profile the startup/shutdown, I've never > done > > > that. Perhaps I'll set some time apart for that :). It's definitely > true > > > though that if we see a significant delay there we wouldn't just > improve > > > the efficiency of the tests but also customer experience. > > > > > > Best, > > > Viktor > > > > > > > > > > > > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma <isma...@gmail.com> wrote: > > > > > > > It's an idea that has come up before and worth exploring eventually. > > > > However, I'd first try to optimize the server startup/shutdown > process. > > > If > > > > we measure where the time is going, maybe some opportunities will > > present > > > > themselves. > > > > > > > > Ismael > > > > > > > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass < > > > viktorsomo...@gmail.com > > > > > > > > > wrote: > > > > > > > > > Hi Folks, > > > > > > > > > > I've been observing lately that unit tests usually take 2.5 hours > to > > > run > > > > > and a very big portion of these are the core tests where a new > > cluster > > > is > > > > > spun up for every test. This takes most of the time. I ran a test > > > > > (TopicCommandWithAdminClient with 38 test inside) through the > > profiler > > > > and > > > > > it shows for instance that running the whole class itself took 10 > > > minutes > > > > > and 37 seconds where the useful time was 5 minutes 18 seconds. > > That's a > > > > > 100% overhead. Without profiler the whole class takes 7 minutes and > > 48 > > > > > seconds, so the useful time would be between 3-4 minutes. This is a > > > > bigger > > > > > test though, most of them won't take this much. > > > > > There are 74 classes that implement KafkaServerTestHarness and just > > > > running > > > > > :core:integrationTest takes almost 2 hours. > > > > > > > > > > I think we could greatly speed up these integration tests by just > > > > creating > > > > > the cluster once per class and perform the tests on separate > > methods. I > > > > > know that this a little bit contradicts to the principle that tests > > > > should > > > > > be independent but it seems like recreating clusters for each is a > > very > > > > > expensive operation. Also if the tests are acting on different > > > resources > > > > > (different topics, etc.) then it might not hurt their independence. > > > There > > > > > might be cases of course where this is not possible but I think > there > > > > could > > > > > be a lot where it is. > > > > > > > > > > In the optimal case we could cut the testing time back by > > approximately > > > > an > > > > > hour. This would save resources and give quicker feedback for PR > > > builds. > > > > > > > > > > What are your thoughts? > > > > > Has anyone thought about this or were there any attempts made? > > > > > > > > > > Best, > > > > > Viktor > > > > > > > > > > > > > > > > > > -- > > Best, > > Stanislav > > > -- Best, Stanislav