[ https://issues.apache.org/jira/browse/KAFKA-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925521#comment-15925521 ]
Guozhang Wang commented on KAFKA-4885: -------------------------------------- As for the fix, I think it could be quite complicated and would like to discuss with people about such error handling ([~mjsax], [~damianguy], [~enothereska]): 1. we should consider if we ever would want to shutdown the thread in the finally block when there is any unexpected exception thrown; or should we close the whole instance instead of only silently shutting down this thread. 2. as for this specific issue, do we want to set the num.retry to infinity so that we would rather block forever than fail and die. > processstreamwithcachedstatestore and other streams benchmarks fail > occasionally > --------------------------------------------------------------------------------- > > Key: KAFKA-4885 > URL: https://issues.apache.org/jira/browse/KAFKA-4885 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.0 > Reporter: Eno Thereska > Fix For: 0.11.0.0 > > > test_id: > kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=processstreamwithcachedstatestore.scale=2 > status: FAIL > run time: 14 minutes 58.069 seconds > Streams Test process on ubuntu@worker5 took too long to exit > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py", > line 86, in test_simple_benchmark > self.driver[num].wait() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py", > line 102, in wait > self.wait_node(node, timeout_sec) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py", > line 106, in wait_node > wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, > err_msg="Streams Test process on " + str(node.account) + " took too long to > exit") > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Streams Test process on ubuntu@worker5 took too long to exit > The log contains several lines like: > [2017-03-11 04:52:59,080] DEBUG Attempt to heartbeat failed for group > simple-benchmark-streams-with-storetrue since it is rebalancing. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-03-11 04:53:01,987] DEBUG Sending Heartbeat request for group > simple-benchmark-streams-with-storetrue to coordinator worker10:9092 (id: > 2147483646 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-03-11 04:53:02,088] DEBUG Attempt to heartbeat failed for group > simple-benchmark-streams-with-storetrue since it is rebalancing. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-03-11 04:53:04,995] DEBUG Sending Heartbeat request for group > simple-benchmark-streams-with-storetrue to coordinator worker10:9092 (id: > 2147483646 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > Other tests that fail the same way include: > test_id: > kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=count.scale=2 > test_id: > kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=processstreamwithsink.scale=1 > test_id: > kafkatest.tests.streams.streams_bounce_test.StreamsBounceTest.test_bounce -- This message was sent by Atlassian JIRA (v6.3.15#6346)