[ https://issues.apache.org/jira/browse/FLINK-21015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Khachatryan updated FLINK-21015: -------------------------------------- Description: Please see the discussion in FLINK-20992 {quote}Additionally, I found that during the shutting down we don't wait for checkpoint cleanup to complete (or any other tasks submitted to executors): {code:java} checkpointCoordinatorTimer.shutdownNow() // in ExecutionGraph scheduledExecutorService.shutdownNow(); // in JobManagerSharedServices {code} So only currently executing actions will complete, but not any queued. I think we SHOULD complete cleanup on shutdown and propose the following: # Replace shutdownNow with shutdown to allow cleanup to finish # Add awaitTermination (with timeout) # At least log the result of shutdownNow (list of runnables) # {quote} [~trohrmann] {quote}I think you are also right that we currently don't wait for the checkpoint cleanup to complete. I think the component responsible for the clean up tasks should make sure that they are completed before shutting down or hand them over to a new owner who is responsible for them. Hence, if the {{CheckpointCoordinator}} is responsible, then the {{CheckpointCoordinator.shutdown}} method should make sure that all checkpoints are cleaned up. Alternatively, {{CheckpointCoordinator.shutdown}} could return a {{CompletableFuture}} which is completed once everything is cleaned up. Consequently, I wouldn't make it the responsibility of the exeuctors to make sure that all checkpoints are properly cleaned up by waiting on the completion of all enqueued tasks. {quote} was: Please see the [discussion|https://issues.apache.org/jira/browse/FLINK-20992?focusedCommentId=17267188&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17267188] in FLINK-20992 > Wait for checkpoint cleanup completion during shutdown > ------------------------------------------------------ > > Key: FLINK-21015 > URL: https://issues.apache.org/jira/browse/FLINK-21015 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Affects Versions: 1.13.0 > Reporter: Roman Khachatryan > Priority: Major > Fix For: 1.13.0 > > > Please see the discussion in FLINK-20992 > > {quote}Additionally, I found that during the shutting down we don't wait for > checkpoint cleanup to complete (or any other tasks submitted to executors): > {code:java} > checkpointCoordinatorTimer.shutdownNow() // in ExecutionGraph > scheduledExecutorService.shutdownNow(); // in JobManagerSharedServices > {code} > So only currently executing actions will complete, but not any queued. > I think we SHOULD complete cleanup on shutdown and propose the following: > # Replace shutdownNow with shutdown to allow cleanup to finish > # Add awaitTermination (with timeout) > # At least log the result of shutdownNow (list of runnables) > # > > {quote} > [~trohrmann] > {quote}I think you are also right that we currently don't wait for the > checkpoint cleanup to complete. I think the component responsible for the > clean up tasks should make sure that they are completed before shutting down > or hand them over to a new owner who is responsible for them. Hence, if the > {{CheckpointCoordinator}} is responsible, then the > {{CheckpointCoordinator.shutdown}} method should make sure that all > checkpoints are cleaned up. Alternatively, {{CheckpointCoordinator.shutdown}} > could return a {{CompletableFuture}} which is completed once everything is > cleaned up. Consequently, I wouldn't make it the responsibility of the > exeuctors to make sure that all checkpoints are properly cleaned up by > waiting on the completion of all enqueued tasks. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)