[ https://issues.apache.org/jira/browse/FLINK-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091312#comment-17091312 ]
Robert Metzger commented on FLINK-16423: ---------------------------------------- Another case: https://api.travis-ci.org/v3/job/678609505/log.txt > test_ha_per_job_cluster_datastream.sh gets stuck > ------------------------------------------------ > > Key: FLINK-16423 > URL: https://issues.apache.org/jira/browse/FLINK-16423 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Tests > Reporter: Robert Metzger > Assignee: Robert Metzger > Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: 20200408.1.tgz > > > This was seen in > https://dev.azure.com/rmetzger/Flink/_build/results?buildId=5905&view=logs&j=b1623ac9-0979-5b0d-2e5e-1377d695c991&t=e7804547-1789-5225-2bcf-269eeaa37447 > ... the relevant part of the logs is here: > {code} > 2020-03-04T11:27:25.4819486Z > ============================================================================== > 2020-03-04T11:27:25.4820470Z Running 'Running HA per-job cluster (rocks, > non-incremental) end-to-end test' > 2020-03-04T11:27:25.4820922Z > ============================================================================== > 2020-03-04T11:27:25.4840177Z TEST_DATA_DIR: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-25482960156 > 2020-03-04T11:27:25.6712478Z Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT > 2020-03-04T11:27:25.6830402Z Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT > 2020-03-04T11:27:26.2988914Z Starting zookeeper daemon on host fv-az655. > 2020-03-04T11:27:26.3001237Z Running on HA mode: parallelism=4, > backend=rocks, asyncSnapshots=true, and incremSnapshots=false. > 2020-03-04T11:27:27.4206924Z Starting standalonejob daemon on host fv-az655. > 2020-03-04T11:27:27.4217066Z Start 1 more task managers > 2020-03-04T11:27:30.8412541Z Starting taskexecutor daemon on host fv-az655. > 2020-03-04T11:27:38.1779980Z Job (00000000000000000000000000000000) is > running. > 2020-03-04T11:27:38.1781375Z Running JM watchdog @ 89778 > 2020-03-04T11:27:38.1781858Z Running TM watchdog @ 89779 > 2020-03-04T11:27:38.1783272Z Waiting for text Completed checkpoint [1-9]* for > job 00000000000000000000000000000000 to appear 2 of times in logs... > 2020-03-04T13:21:29.9076797Z ##[error]The operation was canceled. > 2020-03-04T13:21:29.9094090Z ##[section]Finishing: Run e2e tests > {code} > The last three lines indicate that the test is waiting forever for a > checkpoint to appear. -- This message was sent by Atlassian Jira (v8.3.4#803005)