Hi, while I am also extremely annoyed at times by the amount of coffee I have to drink before tests finish I think the argument about flaky tests is valid! The current setup has the benefit that every test case runs on a pristine cluster, if we changed this we'd need to go through all tests and ensure that topic names are different, which can probably be abstracted to include a timestamp in the name or something like that, but it is an additional failure potential. Add to this the fact that "JUnit runs tests using a deterministic, but unpredictable order" and the water gets even muddier. Potentially this might mean that adding an additional test case changes the order that existing test cases are executed in which might mean that all of a sudden something breaks that you didn't even touch.
Best regards, Sönke On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski <stanis...@confluent.io> wrote: > > Hey Viktor, > > I am all up for the idea of speeding up the tests. Running the > `:core:integrationTest` command takes an absurd amount of time as is and is > continuously going to go up if we don't do anything about it. > Having said that, I am very scared that your proposal might significantly > increase the test flakiness of current and future tests - test flakiness is > a huge problem we're battling. We don't get green PR builds too often - it > is very common that one or two flaky tests fail in each PR. > We have also found it hard to get a green build for the 2.2 release ( > https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/). > > On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass < > viktorsomo...@gmail.com> wrote: > > > Hi Folks, > > > > I've been observing lately that unit tests usually take 2.5 hours to run > > and a very big portion of these are the core tests where a new cluster is > > spun up for every test. This takes most of the time. I ran a test > > (TopicCommandWithAdminClient with 38 test inside) through the profiler and > > it shows for instance that running the whole class itself took 10 minutes > > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a > > 100% overhead. Without profiler the whole class takes 7 minutes and 48 > > seconds, so the useful time would be between 3-4 minutes. This is a bigger > > test though, most of them won't take this much. > > There are 74 classes that implement KafkaServerTestHarness and just running > > :core:integrationTest takes almost 2 hours. > > > > I think we could greatly speed up these integration tests by just creating > > the cluster once per class and perform the tests on separate methods. I > > know that this a little bit contradicts to the principle that tests should > > be independent but it seems like recreating clusters for each is a very > > expensive operation. Also if the tests are acting on different resources > > (different topics, etc.) then it might not hurt their independence. There > > might be cases of course where this is not possible but I think there could > > be a lot where it is. > > > > In the optimal case we could cut the testing time back by approximately an > > hour. This would save resources and give quicker feedback for PR builds. > > > > What are your thoughts? > > Has anyone thought about this or were there any attempts made? > > > > Best, > > Viktor > > > > > -- > Best, > Stanislav -- Sönke Liebau Partner Tel. +49 179 7940878 OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany