vvcephei commented on pull request #9911: URL: https://github.com/apache/kafka/pull/9911#issuecomment-761176712
In detail, here's my analysis of a recent timeout: Looking at the debug log for the test, there seem to be some remote commands that really burn the clock. Consider this segment: ``` 680 [DEBUG - 2021-01-12 21:34:54,612 - remoteaccount - _log - lineno:160]: ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log 681 [INFO - 2021-01-12 21:35:18,751 - streams_optimized_test - all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check 682 [DEBUG - 2021-01-12 21:35:19,752 - remoteaccount - _log - lineno:160]: ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log 683 [INFO - 2021-01-12 21:36:08,111 - streams_optimized_test - all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check 684 [DEBUG - 2021-01-12 21:36:09,112 - remoteaccount - _log - lineno:160]: ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log 685 [INFO - 2021-01-12 21:37:38,270 - streams_optimized_test - all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check 686 [DEBUG - 2021-01-12 21:37:39,271 - remoteaccount - _log - lineno:160]: ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log 687 [INFO - 2021-01-12 21:40:28,719 - streams_optimized_test - all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check 688 [DEBUG - 2021-01-12 21:40:29,720 - remoteaccount - _log - lineno:160]: ubuntu@worker8: Running ssh command: sed -n 's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log 689 [INFO - 2021-01-12 21:46:12,541 - streams_optimized_test - all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check ``` The first sed takes 20 seconds, the second takes 50, the third takes 90, the fourth takes 2.5 minutes, and the last takes 5.5 minutes. The next time we try it (on worker6) it pushes the whole test over the 30m mark: ``` 1030 [DEBUG - 2021-01-12 21:46:29,527 - remoteaccount - _log - lineno:160]: ubuntu@worker6: Running ssh command: sed -n 's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log 1031 [INFO - 2021-01-12 21:53:31,673 - streams_optimized_test - all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check 1032 [DEBUG - 2021-01-12 21:53:32,675 - remoteaccount - _log - lineno:160]: ubuntu@worker6: Running ssh command: sed -n 's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log 1033 [INFO - 2021-01-12 22:01:38,267 - streams_optimized_test - all_source_subtopology_tasks - lineno:134]: Returned [] from assigned task check 1034 [DEBUG - 2021-01-12 22:01:39,268 - remoteaccount - _log - lineno:160]: ubuntu@worker6: Running ssh command: sed -n 's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/^A/p' /mnt/streams/streams.log 1035 [INFO - 2021-01-12 22:03:50,579 - runner_client - log - lineno:240]: RunnerClient: kafkatest.tests.streams.streams_optimized_test.StreamsOptimizedTest.test_upgrade_optimized_topology: FAIL: 1036 Traceback (most recent call last): 1037 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/tests/runner_client.py", line 134, in run 1038 data = self.run_test() 1039 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/tests/runner_client.py", line 192, in run_test 1040 return self.test_context.function(self.test) 1041 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/streams/streams_optimized_test.py", line 88, in test_upgrade_optimized_topology 1042 self.verify_processing(processors, verify_individual_operations=True) 1043 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/streams/streams_optimized_test.py", line 114, in verify_processing 1044 if not self.all_source_subtopology_tasks(processor): 1045 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/streams/streams_optimized_test.py", line 133, in all_source_subtopology_tasks 1046 found = list(processor.node.account.ssh_capture("sed -n 's/.*current active tasks: \[\(\(0_[0-9], \)\{3\}0_[0-9]\)\].*/\1/p' %s" % processor.LOG_FILE, allow_fail=True)) 1047 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/cluster/remoteaccount.py", line 651, in next 1048 return next(self.iter_obj) 1049 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/cluster/remoteaccount.py", line 318, in output_generator 1050 for line in iter(stdout.readline, ''): 1051 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/paramiko-2.7.2-py3.6.egg/paramiko/file.py", line 291, in readline 1052 new_data = self._read(n) 1053 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/paramiko-2.7.2-py3.6.egg/paramiko/channel.py", line 1361, in _read 1054 return self.channel.recv(size) 1055 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/paramiko-2.7.2-py3.6.egg/paramiko/channel.py", line 699, in recv 1056 out = self.in_buffer.read(nbytes, self.timeout) 1057 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/paramiko-2.7.2-py3.6.egg/paramiko/buffered_pipe.py", line 160, in read 1058 self._cv.wait(timeout) 1059 File "/usr/local/lib/python3.6/threading.py", line 295, in wait 1060 waiter.acquire() 1061 KeyboardInterrupt ``` Looks like the first sed is 7 minutes, the second 8 minutes, and then we time out in the third. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org