Roman Khachatryan created FLINK-21053: -----------------------------------------
Summary: Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing JM Key: FLINK-21053 URL: https://issues.apache.org/jira/browse/FLINK-21053 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.13.0 In the past, there were multiple bugs caused by throwing/handling RejectedExecutionException in CheckpointCoordinator (FLINK-18290, FLINK-20992). And I think it's still possible as there are many places where an executor is passed to calls to CompletableFuture.xxxAsync while it can already be shut down. In FLINK-20992 we discussed two approaches to fix this. One approach is to check executor state inside a synchronized block every time when it is used. Second approach is to # Create executors inside CheckpointCoordinator (both io & timer thread pools) # Check isShutdown() in their error handlers (if yes and it's RejectedExecutionException then just log; otherwise delegate to FatalExitExceptionHandler) # (this will allow to remove such RejectedExecutionException checks from coordinator code) -- This message was sent by Atlassian Jira (v8.3.4#803005)