[ 
https://issues.apache.org/jira/browse/FLINK-10842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696161#comment-16696161
 ] 

ASF GitHub Bot commented on FLINK-10842:
----------------------------------------

azagrebin commented on a change in pull request #7073: [FLINK-10842][E2E tests] 
fix broken waiting loops in common.sh
URL: https://github.com/apache/flink/pull/7073#discussion_r235796341
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/common.sh
 ##########
 @@ -242,30 +245,45 @@ function start_taskmanagers {
 }
 
 function start_and_wait_for_tm {
-  local url="${REST_PROTOCOL}://${NODENAME}:8081/taskmanagers"
-
-  tm_query_result=$(curl ${CURL_SSL_ARGS} -s "${url}")
-
+  tm_query_result=`query_running_tms`
   # we assume that the cluster is running
   if ! [[ ${tm_query_result} =~ \{\"taskmanagers\":\[.*\]\} ]]; then
     echo "Your cluster seems to be unresponsive at the moment: 
${tm_query_result}" 1>&2
     exit 1
   fi
 
-  running_tms=`curl ${CURL_SSL_ARGS} -s "${url}" | grep -o "id" | wc -l`
-
+  running_tms=`query_number_of_running_tms`
   ${FLINK_DIR}/bin/taskmanager.sh start
+  wait_for_number_of_running_tms $((running_tms+1))
+}
 
-  for i in {1..10}; do
-    local new_running_tms=`curl ${CURL_SSL_ARGS} -s "${url}" | grep -o "id" | 
wc -l`
-    if [ $((new_running_tms-running_tms)) -eq 0 ]; then
-      echo "TaskManager is not yet up."
+function query_running_tms {
+  local url="${REST_PROTOCOL}://${NODENAME}:8081/taskmanagers"
+  curl ${CURL_SSL_ARGS} -s "${url}"
 
 Review comment:
   We check before (line 250) that cluster is running and responses correctly.
   I would assume we do not expect this commands to fail.
   If something is wrong with querying cluster, the script should fail fast, 
wdyt?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Waiting loops are broken in e2e/common.sh
> -----------------------------------------
>
>                 Key: FLINK-10842
>                 URL: https://issues.apache.org/jira/browse/FLINK-10842
>             Project: Flink
>          Issue Type: Bug
>          Components: E2E Tests
>    Affects Versions: 1.7.0
>            Reporter: Andrey Zagrebin
>            Assignee: Andrey Zagrebin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.8.0
>
>
> There are 3 loops in flink-end-to-end-tests/test-scripts/common.sh where the 
> script waits for some event to happen (for i in \{1..10}; do):
>  - wait_dispatcher_running
>  - start_and_wait_for_tm
>  - wait_job_running
> All loops have 10 iterations and the loop breaks if the awaited event 
> happens. If timeout occurs then the script does not fail and the function 
> just continues after 10 iterations ignoring that the awaited event did not 
> happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to