[ https://issues.apache.org/jira/browse/FLINK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Metzger updated FLINK-17825: ----------------------------------- Description: CI (normal profile): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1867&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5 {code} 2020-05-19T20:46:50.9034002Z Killed TM @ 104061 2020-05-19T20:47:05.8510180Z Killed TM @ 107775 2020-05-19T20:47:55.1181475Z Killed TM @ 108337 2020-05-19T20:48:16.7907005Z Test (pid: 89099) did not finish after 540 seconds. 2020-05-19T20:48:16.7907777Z Printing Flink logs and killing it: [...] 2020-05-19T20:48:19.1016912Z /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh: line 125: 89099 Terminated ( cmdpid=$BASHPID; ( sleep $TEST_TIMEOUT_SECONDS; echo "Test (pid: $cmdpid) did not finish after $TEST_TIMEOUT_SECONDS seconds."; echo "Printing Flink logs and killing it:"; cat ${FLINK_DIR}/log/*; kill "$cmdpid" ) & watchdog_pid=$!; echo $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid; run_ha_test 4 ${STATE_BACKEND_TYPE} ${STATE_BACKEND_FILE_ASYNC} ${STATE_BACKEND_ROCKS_INCREMENTAL} ${ZOOKEEPER_VERSION} ) 2020-05-19T20:48:19.1017985Z Stopping job timeout watchdog (with pid=89100) 2020-05-19T20:48:19.1018621Z /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh: line 112: kill: (89100) - No such process 2020-05-19T20:48:19.1019000Z Killing JM watchdog @ 91127 2020-05-19T20:48:19.1019199Z Killing TM watchdog @ 91883 2020-05-19T20:48:19.1019424Z [FAIL] Test script contains errors. 2020-05-19T20:48:19.1019639Z Checking of logs skipped. 2020-05-19T20:48:19.1019785Z 2020-05-19T20:48:19.1020329Z [FAIL] 'Running HA (rocks, non-incremental) end-to-end test' failed after 9 minutes and 0 seconds! Test exited with exit code 1 {code} was: CI: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1867&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5 {code} 2020-05-19T20:46:50.9034002Z Killed TM @ 104061 2020-05-19T20:47:05.8510180Z Killed TM @ 107775 2020-05-19T20:47:55.1181475Z Killed TM @ 108337 2020-05-19T20:48:16.7907005Z Test (pid: 89099) did not finish after 540 seconds. 2020-05-19T20:48:16.7907777Z Printing Flink logs and killing it: [...] 2020-05-19T20:48:19.1016912Z /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh: line 125: 89099 Terminated ( cmdpid=$BASHPID; ( sleep $TEST_TIMEOUT_SECONDS; echo "Test (pid: $cmdpid) did not finish after $TEST_TIMEOUT_SECONDS seconds."; echo "Printing Flink logs and killing it:"; cat ${FLINK_DIR}/log/*; kill "$cmdpid" ) & watchdog_pid=$!; echo $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid; run_ha_test 4 ${STATE_BACKEND_TYPE} ${STATE_BACKEND_FILE_ASYNC} ${STATE_BACKEND_ROCKS_INCREMENTAL} ${ZOOKEEPER_VERSION} ) 2020-05-19T20:48:19.1017985Z Stopping job timeout watchdog (with pid=89100) 2020-05-19T20:48:19.1018621Z /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh: line 112: kill: (89100) - No such process 2020-05-19T20:48:19.1019000Z Killing JM watchdog @ 91127 2020-05-19T20:48:19.1019199Z Killing TM watchdog @ 91883 2020-05-19T20:48:19.1019424Z [FAIL] Test script contains errors. 2020-05-19T20:48:19.1019639Z Checking of logs skipped. 2020-05-19T20:48:19.1019785Z 2020-05-19T20:48:19.1020329Z [FAIL] 'Running HA (rocks, non-incremental) end-to-end test' failed after 9 minutes and 0 seconds! Test exited with exit code 1 {code} > HA end-to-end gets killed due to timeout > ---------------------------------------- > > Key: FLINK-17825 > URL: https://issues.apache.org/jira/browse/FLINK-17825 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests > Affects Versions: 1.12.0 > Reporter: Robert Metzger > Assignee: Robert Metzger > Priority: Critical > Labels: test-stability > > CI (normal profile): > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1867&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5 > {code} > 2020-05-19T20:46:50.9034002Z Killed TM @ 104061 > 2020-05-19T20:47:05.8510180Z Killed TM @ 107775 > 2020-05-19T20:47:55.1181475Z Killed TM @ 108337 > 2020-05-19T20:48:16.7907005Z Test (pid: 89099) did not finish after 540 > seconds. > 2020-05-19T20:48:16.7907777Z Printing Flink logs and killing it: > [...] > 2020-05-19T20:48:19.1016912Z > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh: > line 125: 89099 Terminated ( cmdpid=$BASHPID; ( sleep > $TEST_TIMEOUT_SECONDS; echo "Test (pid: $cmdpid) did not finish after > $TEST_TIMEOUT_SECONDS seconds."; echo "Printing Flink logs and killing it:"; > cat ${FLINK_DIR}/log/*; kill "$cmdpid" ) & watchdog_pid=$!; echo > $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid; run_ha_test 4 > ${STATE_BACKEND_TYPE} ${STATE_BACKEND_FILE_ASYNC} > ${STATE_BACKEND_ROCKS_INCREMENTAL} ${ZOOKEEPER_VERSION} ) > 2020-05-19T20:48:19.1017985Z Stopping job timeout watchdog (with pid=89100) > 2020-05-19T20:48:19.1018621Z > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh: > line 112: kill: (89100) - No such process > 2020-05-19T20:48:19.1019000Z Killing JM watchdog @ 91127 > 2020-05-19T20:48:19.1019199Z Killing TM watchdog @ 91883 > 2020-05-19T20:48:19.1019424Z [FAIL] Test script contains errors. > 2020-05-19T20:48:19.1019639Z Checking of logs skipped. > 2020-05-19T20:48:19.1019785Z > 2020-05-19T20:48:19.1020329Z [FAIL] 'Running HA (rocks, non-incremental) > end-to-end test' failed after 9 minutes and 0 seconds! Test exited with exit > code 1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)