Chesnay Schepler created FLINK-9678: ---------------------------------------
Summary: Remove hard-coded sleeps in HA E2E test Key: FLINK-9678 URL: https://issues.apache.org/jira/browse/FLINK-9678 Project: Flink Issue Type: Improvement Components: Distributed Coordination, Tests Affects Versions: 1.5.0, 1.6.0 Reporter: Chesnay Schepler {{test_ha.sh}} uses 2 hard-coded sleeps. {code:java} # let the job run for a while to take some checkpoints sleep 20 for (( c=0; c<${JM_KILLS}; c++ )); do # kill the JM and wait for watchdog to # create a new one which will take over kill_jm sleep 60 done{code} These sleeps are always troublesome as they either make the test brittle by being to small, or causing the test to idle when they are to large. The first sleep should be replaced with {{wait_num_checkpoints.}} I'm not entirely sure about the semantics of the second sleep, but I guess we're waiting for the new JM to continue the job execution. In this case I suggest to instead query the job status via REST and wait until the job is running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)