Hongshun Wang created FLINK-32668:
-------------------------------------

             Summary: fix up watchdog timeout bug in common.sh(e2e test) ?
                 Key: FLINK-32668
                 URL: https://issues.apache.org/jira/browse/FLINK-32668
             Project: Flink
          Issue Type: Improvement
          Components: Build System / CI
    Affects Versions: 1.17.1
            Reporter: Hongshun Wang
             Fix For: 1.17.2
         Attachments: image-2023-07-25-15-27-37-441.png

When run e2e test, an error like this occrurs:

!image-2023-07-25-15-27-37-441.png|width=733,height=115!

then I find a problem in the corresponding code:

 
{code:java}
kill_test_watchdog() {
    local watchdog_pid=$(cat $TEST_DATA_DIR/job_watchdog.pid)
    echo "Stopping job timeout watchdog (with pid=$watchdog_pid)"
    kill $watchdog_pid
} 
internal_run_with_timeout() {
    local timeout_in_seconds="$1"
    local on_failure="$2"
    local command_label="$3"
    local command="${@:4}"

    on_exit kill_test_watchdog
   (
           command_pid=$BASHPID
           (sleep "${timeout_in_seconds}" # set a timeout for this command
            echo "${command_label:-"The command '${command}'"} (pid: 
$command_pid) did not finish after $timeout_in_seconds seconds."
eval "${on_failure}"
           kill "$command_pid") & watchdog_pid=$!
           echo $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid
           # invoke
          $command
  )

}{code}
 

When {{$command}} completes before the timeout, the watchdog process is killed 
successfully. However, when {{$command}} times out, the watchdog process kills 
{{$command}} and then exits itself, leaving behind an error message when trying 
to kill its own process ID with {{{}kill $watchdog_pid{}}}.

 

So, I will modify like this:

 
{code:java}
kill_test_watchdog() {
      local watchdog_pid=$(cat $TEST_DATA_DIR/job_watchdog.pid)
      if kill -0 $watchdog_pid > /dev/null 2>&1; then
           echo "Stopping job timeout watchdog (with pid=$watchdog_pid)"
           kill $watchdog_pid
      else
            echo "watchdog (with pid=$watchdog_pid) does not exist now"
      fi
} {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to