[ https://issues.apache.org/jira/browse/FLINK-10753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Richter closed FLINK-10753. ---------------------------------- Resolution: Fixed Fix Version/s: 1.6.3 1.5.6 Merged in: master: dc8e27f release-1.7: 9477b6a release-1.6: 6814e5f release-1.5: 1725591 > Propagate and log snapshotting exceptions > ----------------------------------------- > > Key: FLINK-10753 > URL: https://issues.apache.org/jira/browse/FLINK-10753 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.6.2, 1.7.0 > Reporter: Alexander Fedulov > Assignee: Stefan Richter > Priority: Major > Labels: pull-request-available > Fix For: 1.5.6, 1.6.3, 1.7.0 > > Attachments: Screen Shot 2018-11-01 at 16.27.01.png > > > Upon failure, {{AbstractStreamOperator.snapshotState}} rethrows a new > exception with the message "{{Could not complete snapshot {} for operator > {}.}}" and the original exception as the cause. > While handling the error, {{CheckpointCoordinator.discardCheckpoint}} method > logs only this propagated message and not the original cause of the > exception. > In addition, {{pendingCheckpoint.abortDeclined()}}, called from the > {{discardCheckpoint}}, reports the failed checkpoint with a misleading > message "{{Checkpoint was declined (tasks not ready)}}". This message is what > will be displayed in the UI (see attached). > Proposition: > # Log exception at the Task Manager (.snapshotState) > # Log cause, instead of cause.getMessage() at the JobsManager > (.dicardCheckpoint) > # Pass root cause to abortDeclined and propagate a more appropriate message > to the UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)