ifndef-SleePy commented on a change in pull request #11347: 
[FLINK-14971][checkpointing] Make all the non-IO operations in 
CheckpointCoordinator single-threaded
URL: https://github.com/apache/flink/pull/11347#discussion_r392856449
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PendingCheckpoint.java
 ##########
 @@ -311,25 +315,32 @@ public CheckpointException getFailureCause() {
                                try (CheckpointMetadataOutputStream out = 
targetLocation.createMetadataOutputStream()) {
                                        
Checkpoints.storeCheckpointMetadata(savepoint, out);
                                        finalizedLocation = 
out.closeAndFinalizeCheckpoint();
+                               }
 
+                               CompletedCheckpoint completed = new 
CompletedCheckpoint(
+                                       jobId,
+                                       checkpointId,
+                                       checkpointTimestamp,
+                                       System.currentTimeMillis(),
+                                       operatorStates,
+                                       masterStates,
+                                       props,
+                                       finalizedLocation);
+
+                               try {
+                                       
completedCheckpointStore.addCheckpoint(completed);
+                               } catch (Throwable t) {
+                                       completed.discardOnFailedStoring();
 
 Review comment:
   Good question!
   
   Actually `completedCheckpointStore.addCheckpoint` should be called before 
`finalizedLocationFuture.thenApplyAsync((completed)`. The 
`finalizedLocationFuture.thenApplyAsync((completed)` does things like 
completing `onCompletionPromise`, reporting completed statistics, disposing the 
pending checkpoint. However if `completedCheckpointStore.addCheckpoint` fails 
afterwards, does this checkpoint succeeds? I don't think so. But 
`onCompletionPromise` has been completed in this scenario. It's inconsistent 
here.
   
   So the right way here is calling `completedCheckpointStore.addCheckpoint` 
first, then completing `onCompletionPromise`.
   
   I was planning to do this as a follow-up issue. However since we have 
decided to combine the finalization and adding into `completedCheckpointStore` 
to simplify the operations between IO threads and main thread, I think it's a 
good opportunity to do this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to