Greg Harris created KAFKA-14338: ----------------------------------- Summary: Connect RetryUtilTest flakey in CPU-limited environments Key: KAFKA-14338 URL: https://issues.apache.org/jira/browse/KAFKA-14338 Project: Kafka Issue Type: Test Components: KafkaConnect Reporter: Greg Harris Assignee: Greg Harris
the RetryUtilTest added alongside the RetryUtil in [https://github.com/apache/kafka/pull/11797] has some unresolved flakiness issues in CPU restricted environments. I was able to reproduce two flakey failures with a 2% CPU throttle in place: {noformat} 1) testExhaustingRetries(org.apache.kafka.connect.util.RetryUtilTest) org.junit.runners.model.TestTimedOutException: test timed out after 1000 milliseconds at org.junit.internal.runners.MethodRoadie$1.run(MethodRoadie.java:78) at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:97) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.executeTest(PowerMockJUnit44RunnerDelegateImpl.java:310) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTestInSuper(PowerMockJUnit47RunnerDelegateImpl.java:131) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.access$100(PowerMockJUnit47RunnerDelegateImpl.java:59) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner$TestExecutorStatement.evaluate(PowerMockJUnit47RunnerDelegateImpl.java:147) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.evaluateStatement(PowerMockJUnit47RunnerDelegateImpl.java:107) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTest(PowerMockJUnit47RunnerDelegateImpl.java:82) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runBeforesThenTestThenAfters(PowerMockJUnit44RunnerDelegateImpl.java:298) at org.junit.internal.runners.MethodRoadie.runWithTimeout(MethodRoadie.java:58) at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:48) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:218) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:160) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$1.run(PowerMockJUnit44RunnerDelegateImpl.java:134) at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:34) at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:44) at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.run(PowerMockJUnit44RunnerDelegateImpl.java:136) at org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:117) at org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:57) at org.powermock.modules.junit4.PowerMockRunner.run(PowerMockRunner.java:59) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at org.junit.runner.JUnitCore.run(JUnitCore.java:115) at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77) at org.junit.runner.JUnitCore.main(JUnitCore.java:36) 2) retriesEventuallySucceed(org.apache.kafka.connect.util.RetryUtilTest) org.apache.kafka.connect.errors.ConnectException: Fail to Test after 1 attempts. Reason: null at org.apache.kafka.connect.util.RetryUtil.retryUntilTimeout(RetryUtil.java:101) at org.apache.kafka.connect.util.RetryUtilTest.retriesEventuallySucceed(RetryUtilTest.java:74) ... 38 trimmed Caused by: org.apache.kafka.common.errors.TimeoutException {noformat} Rather than relying on flat timeouts, the test should be written such that deadlocks are impossible and the retries proceed deterministically. -- This message was sent by Atlassian Jira (v8.20.10#820010)