[jira] [Created] (FLINK-21053) Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing JM

Roman Khachatryan (Jira) Wed, 20 Jan 2021 01:57:13 -0800

Roman Khachatryan created FLINK-21053:
-----------------------------------------


             Summary: Prevent further RejectedExecutionExceptions in 
CheckpointCoordinator failing JM
                 Key: FLINK-21053
                 URL: https://issues.apache.org/jira/browse/FLINK-21053
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Checkpointing
            Reporter: Roman Khachatryan
            Assignee: Roman Khachatryan
             Fix For: 1.13.0


In the past, there were multiple bugs caused by throwing/handling 
RejectedExecutionException in CheckpointCoordinator (FLINK-18290, FLINK-20992).

 

And I think it's still possible as there are many places where an executor is 
passed to calls to CompletableFuture.xxxAsync while it can already be shut down.

 

In FLINK-20992 we discussed two approaches to fix this.

One approach is to check executor state inside a synchronized block every time 
when it is used.

Second approach is to
 # Create executors inside CheckpointCoordinator (both io & timer thread pools)
 # Check isShutdown() in their error handlers (if yes and it's 
RejectedExecutionException then just log; otherwise delegate to 
FatalExitExceptionHandler)
 # (this will allow to remove such RejectedExecutionException checks from 
coordinator code)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21053) Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing JM

Reply via email to