Liu created FLINK-24174: --------------------------- Summary: MiniClusterTestEnvironment‘s triggerTaskManagerFailover may stuck in CommonTestUtils.waitForJobStatus() Key: FLINK-24174 URL: https://issues.apache.org/jira/browse/FLINK-24174 Project: Flink Issue Type: Improvement Components: Test Infrastructure Reporter: Liu
When writing taskmanager failover tests with [unified testing framework for connectors|https://issues.apache.org/jira/browse/FLINK-19554], I find that it may stuck in CommonTestUtils.waitForJobStatus() as following: # triggerTaskManagerFailover is called. # JobStatus switched from RUNNING to RESTARTING. # JobStatus switched from RESTARTING to RUNNING. # The method terminateTaskManager() is completed. # Since the jobStatus is RUNNING, CommonTestUtils.waitForJobStatus() will never exit. A solution is to call terminateTaskManager() with async way. At the same time, call CommonTestUtils.waitForJobStatus(). The pseudo code can be as follow: {code:java} public void triggerTaskManagerFailover(JobClient jobClient, Runnable afterFailAction) throws Exception { CompletableFuture<Void> completableFuture = terminateTaskManager(); CommonTestUtils.waitForJobStatus( jobClient, Arrays.asList(JobStatus.FAILING, JobStatus.FAILED, JobStatus.RESTARTING), Deadline.fromNow(Duration.ofMinutes(5))); completableFuture.get(); afterFailAction.run(); startTaskManager(); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)