[ https://issues.apache.org/jira/browse/FLINK-10856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686290#comment-16686290 ]
ASF GitHub Bot commented on FLINK-10856: ---------------------------------------- tillrohrmann closed pull request #7088: [FLINK-10856] Take latest checkpoint to resume from in resume from externalized checkpoint e2e test URL: https://github.com/apache/flink/pull/7088 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/flink-end-to-end-tests/test-scripts/test_resume_externalized_checkpoints.sh b/flink-end-to-end-tests/test-scripts/test_resume_externalized_checkpoints.sh index fe2319063b5..c1477574d5d 100755 --- a/flink-end-to-end-tests/test-scripts/test_resume_externalized_checkpoints.sh +++ b/flink-end-to-end-tests/test-scripts/test_resume_externalized_checkpoints.sh @@ -111,19 +111,14 @@ else cancel_job $DATASTREAM_JOB fi -CHECKPOINT_PATH=$(ls -d $CHECKPOINT_DIR/$DATASTREAM_JOB/chk-[1-9]*) +# take the latest checkpoint +CHECKPOINT_PATH=$(ls -d $CHECKPOINT_DIR/$DATASTREAM_JOB/chk-[1-9]* | sort -Vr | head -n1) if [ -z $CHECKPOINT_PATH ]; then echo "Expected an externalized checkpoint to be present, but none exists." exit 1 fi -NUM_CHECKPOINTS=$(echo $CHECKPOINT_PATH | wc -l | tr -d ' ') -if (( $NUM_CHECKPOINTS > 1 )); then - echo "Expected only exactly 1 externalized checkpoint to be present, but $NUM_CHECKPOINTS exists." - exit 1 -fi - echo "Restoring job with externalized checkpoint at $CHECKPOINT_PATH ..." BASE_JOB_CMD=`buildBaseJobCmd $NEW_DOP "-s file://${CHECKPOINT_PATH}"` @@ -141,6 +136,11 @@ fi DATASTREAM_JOB=$($JOB_CMD | grep "Job has been submitted with JobID" | sed 's/.* //g') +if [ -z $DATASTREAM_JOB ]; then + echo "Resuming from externalized checkpoint job could not be started." + exit 1 +fi + wait_job_running $DATASTREAM_JOB wait_oper_metric_num_in_records SemanticsCheckMapper.0 200 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Harden resume from externalized checkpoint E2E test > --------------------------------------------------- > > Key: FLINK-10856 > URL: https://issues.apache.org/jira/browse/FLINK-10856 > Project: Flink > Issue Type: Bug > Components: E2E Tests, State Backends, Checkpointing > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Priority: Critical > Labels: pull-request-available > Fix For: 1.5.6, 1.6.3, 1.7.0 > > > The resume from externalized checkpoints E2E test can fail due to > FLINK-10855. We should harden the test script to not expect a single > checkpoint directory being present but to take the checkpoint with the > highest checkpoint counter. -- This message was sent by Atlassian JIRA (v7.6.3#76005)