vvcephei commented on pull request #9911:
URL: https://github.com/apache/kafka/pull/9911#issuecomment-761176712


   In detail, here's my analysis of a recent timeout:
   
   Looking at the debug log for the test, there seem to be some remote commands 
that really burn the clock. Consider this segment:
   
   ```
    680 [DEBUG - 2021-01-12 21:34:54,612 - remoteaccount - _log - lineno:160]: 
ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: 
\[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log
    681 [INFO  - 2021-01-12 21:35:18,751 - streams_optimized_test - 
all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check
    682 [DEBUG - 2021-01-12 21:35:19,752 - remoteaccount - _log - lineno:160]: 
ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: 
\[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log
    683 [INFO  - 2021-01-12 21:36:08,111 - streams_optimized_test - 
all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check
    684 [DEBUG - 2021-01-12 21:36:09,112 - remoteaccount - _log - lineno:160]: 
ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: 
\[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log
    685 [INFO  - 2021-01-12 21:37:38,270 - streams_optimized_test - 
all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check
    686 [DEBUG - 2021-01-12 21:37:39,271 - remoteaccount - _log - lineno:160]: 
ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: 
\[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log
    687 [INFO  - 2021-01-12 21:40:28,719 - streams_optimized_test - 
all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check
    688 [DEBUG - 2021-01-12 21:40:29,720 - remoteaccount - _log - lineno:160]: 
ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: 
\[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log
    689 [INFO  - 2021-01-12 21:46:12,541 - streams_optimized_test - 
all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check
   ```
   
   The first sed takes 20 seconds, the second takes 50, the third takes 90, the 
fourth takes 2.5 minutes, and the last takes 5.5 minutes.
   
   The next time we try it (on worker6) it pushes the whole test over the 30m 
mark:
   
   ```
   1030 [DEBUG - 2021-01-12 21:46:29,527 - remoteaccount - _log - lineno:160]: 
ubuntu@worker6: Running ssh command: sed -n 's/.*current active tasks: 
\[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log
   1031 [INFO  - 2021-01-12 21:53:31,673 - streams_optimized_test - 
all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check
   1032 [DEBUG - 2021-01-12 21:53:32,675 - remoteaccount - _log - lineno:160]: 
ubuntu@worker6: Running ssh command: sed -n 's/.*current active tasks: 
\[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log
   1033 [INFO  - 2021-01-12 22:01:38,267 - streams_optimized_test - 
all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check
   1034 [DEBUG - 2021-01-12 22:01:39,268 - remoteaccount - _log - lineno:160]: 
ubuntu@worker6: Running ssh command: sed -n 's/.*current active tasks: 
\[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log
   1035 [INFO  - 2021-01-12 22:03:50,579 - runner_client - log - lineno:240]: 
RunnerClient: 
kafkatest.tests.streams.streams_optimized_test.StreamsOptimizedTest.test_upgrade_optimized_topology:
 FAIL: 
   1036 Traceback (most recent call last):
   1037   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/tests/runner_client.py",
 line 134, in run
   1038     data = self.run_test()
   1039   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/tests/runner_client.py",
 line 192, in run_test
   1040     return self.test_context.function(self.test)
   1041   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/streams/streams_optimized_test.py",
 line 88, in test_upgrade_optimized_topology
   1042     self.verify_processing(processors, 
verify_individual_operations=True)
   1043   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/streams/streams_optimized_test.py",
 line 114, in verify_processing
   1044     if not self.all_source_subtopology_tasks(processor):
   1045   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/streams/streams_optimized_test.py",
 line 133, in all_source_subtopology_tasks
   1046     found = list(processor.node.account.ssh_capture("sed -n 
's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/\1/p' %s" % 
processor.LOG_FILE, allow_fail=True))
   1047   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/cluster/remoteaccount.py",
 line 651, in next
   1048     return next(self.iter_obj)
   1049   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/cluster/remoteaccount.py",
 line 318, in output_generator
   1050     for line in iter(stdout.readline, ''):
   1051   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/paramiko-2.7.2-py3.6.egg/paramiko/file.py",
 line 291, in readline
   1052     new_data = self._read(n)
   1053   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/paramiko-2.7.2-py3.6.egg/paramiko/channel.py",
 line 1361, in _read
   1054     return self.channel.recv(size)
   1055   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/paramiko-2.7.2-py3.6.egg/paramiko/channel.py",
 line 699, in recv
   1056     out = self.in_buffer.read(nbytes, self.timeout)
   1057   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/paramiko-2.7.2-py3.6.egg/paramiko/buffered_pipe.py",
 line 160, in read
   1058     self._cv.wait(timeout)
   1059   File "/usr/local/lib/python3.6/threading.py", line 295, in wait
   1060     waiter.acquire()
   1061 KeyboardInterrupt
   ```
   
   Looks like the first sed is 7 minutes, the second 8 minutes, and then we 
time out in the third.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to