[ https://issues.apache.org/jira/browse/FLINK-32668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-32668: ----------------------------------- Labels: pull-request-available (was: ) > fix up watchdog timeout error msg in common.sh(e2e test) > ---------------------------------------------------------- > > Key: FLINK-32668 > URL: https://issues.apache.org/jira/browse/FLINK-32668 > Project: Flink > Issue Type: Bug > Components: Build System / CI > Affects Versions: 1.16.2, 1.18.0, 1.17.1 > Reporter: Hongshun Wang > Assignee: Hongshun Wang > Priority: Minor > Labels: pull-request-available > Attachments: image-2023-07-25-15-27-37-441.png > > > When run e2e test, an error like this occrurs: > !image-2023-07-25-15-27-37-441.png|width=733,height=115! > > The corresponding code: > {code:java} > kill_test_watchdog() { > local watchdog_pid=$(cat $TEST_DATA_DIR/job_watchdog.pid) > echo "Stopping job timeout watchdog (with pid=$watchdog_pid)" > kill $watchdog_pid > } > internal_run_with_timeout() { > local timeout_in_seconds="$1" > local on_failure="$2" > local command_label="$3" > local command="${@:4}" > on_exit kill_test_watchdog > ( > command_pid=$BASHPID > (sleep "${timeout_in_seconds}" # set a timeout for this command > echo "${command_label:-"The command '${command}'"} (pid: > $command_pid) did not finish after $timeout_in_seconds seconds." > eval "${on_failure}" > kill "$command_pid") & watchdog_pid=$! > echo $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid > # invoke > $command > ) > }{code} > > When {{$command}} completes before the timeout, the watchdog process is > killed successfully. However, when {{$command}} times out, the watchdog > process kills {{$command}} and then exits itself, leaving behind an error > message when trying to kill its own process ID with {{{}kill > $watchdog_pid{}}}.This error msg "no such process" is hard to understand. > > So, I will modify like this with better error message: > > {code:java} > kill_test_watchdog() { > local watchdog_pid=$(cat $TEST_DATA_DIR/job_watchdog.pid) > if kill -0 $watchdog_pid > /dev/null 2>&1; then > echo "Stopping job timeout watchdog (with pid=$watchdog_pid)" > kill $watchdog_pid > else > echo "[ERROR] Test is timeout" > exit 1 > fi > } {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)