Hi Folks, I've been observing lately that unit tests usually take 2.5 hours to run and a very big portion of these are the core tests where a new cluster is spun up for every test. This takes most of the time. I ran a test (TopicCommandWithAdminClient with 38 test inside) through the profiler and it shows for instance that running the whole class itself took 10 minutes and 37 seconds where the useful time was 5 minutes 18 seconds. That's a 100% overhead. Without profiler the whole class takes 7 minutes and 48 seconds, so the useful time would be between 3-4 minutes. This is a bigger test though, most of them won't take this much. There are 74 classes that implement KafkaServerTestHarness and just running :core:integrationTest takes almost 2 hours.
I think we could greatly speed up these integration tests by just creating the cluster once per class and perform the tests on separate methods. I know that this a little bit contradicts to the principle that tests should be independent but it seems like recreating clusters for each is a very expensive operation. Also if the tests are acting on different resources (different topics, etc.) then it might not hurt their independence. There might be cases of course where this is not possible but I think there could be a lot where it is. In the optimal case we could cut the testing time back by approximately an hour. This would save resources and give quicker feedback for PR builds. What are your thoughts? Has anyone thought about this or were there any attempts made? Best, Viktor