[ 
https://issues.apache.org/jira/browse/KAFKA-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925825#comment-15925825
 ] 

Damian Guy commented on KAFKA-4885:
-----------------------------------

[~guozhang]

1. We give users a chance to shutdown the whole instance via the 
UncaughtExceptionHandler, right? 
2. Perhaps this should be configurable? Personally in apps i write i'd prefer 
to fail fast as failing will usually raise alerts that will help get to the 
root cause of the problem earlier. Retrying forever may mean that such issues 
go unnoticed for long periods of time.

> processstreamwithcachedstatestore and other streams benchmarks fail 
> occasionally
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-4885
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4885
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.0
>            Reporter: Eno Thereska
>             Fix For: 0.11.0.0
>
>
> test_id:    
> kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=processstreamwithcachedstatestore.scale=2
> status:     FAIL
> run time:   14 minutes 58.069 seconds
>     Streams Test process on ubuntu@worker5 took too long to exit
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
>     data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
>     return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py",
>  line 86, in test_simple_benchmark
>     self.driver[num].wait()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 102, in wait
>     self.wait_node(node, timeout_sec)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 106, in wait_node
>     wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, 
> err_msg="Streams Test process on " + str(node.account) + " took too long to 
> exit")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
>     raise TimeoutError(err_msg)
> TimeoutError: Streams Test process on ubuntu@worker5 took too long to exit
> The log contains several lines like:
> [2017-03-11 04:52:59,080] DEBUG Attempt to heartbeat failed for group 
> simple-benchmark-streams-with-storetrue since it is rebalancing. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-03-11 04:53:01,987] DEBUG Sending Heartbeat request for group 
> simple-benchmark-streams-with-storetrue to coordinator worker10:9092 (id: 
> 2147483646 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-03-11 04:53:02,088] DEBUG Attempt to heartbeat failed for group 
> simple-benchmark-streams-with-storetrue since it is rebalancing. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-03-11 04:53:04,995] DEBUG Sending Heartbeat request for group 
> simple-benchmark-streams-with-storetrue to coordinator worker10:9092 (id: 
> 2147483646 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> Other tests that fail the same way include:
> test_id:    
> kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=count.scale=2
> test_id:    
> kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=processstreamwithsink.scale=1
> test_id:    
> kafkatest.tests.streams.streams_bounce_test.StreamsBounceTest.test_bounce



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to