[ https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885716#comment-15885716 ]
Ismael Juma commented on KAFKA-4779: ------------------------------------ The test failed again, this time with a different message: {code} -------------------------------------------------------------------------------- test_id: kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_PLAINTEXT.client_protocol=SSL status: FAIL run time: 4 minutes 32.586 seconds 1152 acked message did not make it to the Consumer. They are: 12288, 12289, 12290, 12291, 12292, 12293, 12294, 12295, 12296, 12297, 12298, 12299, 12300, 12301, 12302, 12303, 12304, 12305, 12306, 12307...plus 1132 more. Total Acked: 12184, Total Consumed: 11032. We validated that the first 1000 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer. Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", line 321, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py", line 148, in test_rolling_upgrade_phase_two self.run_produce_consume_validate(self.roll_in_secured_settings, client_protocol, broker_protocol) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 117, in run_produce_consume_validate self.validate() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 179, in validate assert success, msg AssertionError: 1152 acked message did not make it to the Consumer. They are: 12288, 12289, 12290, 12291, 12292, 12293, 12294, 12295, 12296, 12297, 12298, 12299, 12300, 12301, 12302, 12303, 12304, 12305, 12306, 12307...plus 1132 more. Total Acked: 12184, Total Consumed: 11032. We validated that the first 1000 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer. -------------------------------------------------------------------------------- {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-02-26--001.1488103947--apache--trunk--5b682ba/report.html http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-02-26--001.1488103947--apache--trunk--5b682ba/TestSecurityRollingUpgrade/test_rolling_upgrade_phase_two/broker_protocol=SASL_PLAINTEXT.client_protocol=SSL/62.tgz > Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py > ---------------------------------------------------------------------------- > > Key: KAFKA-4779 > URL: https://issues.apache.org/jira/browse/KAFKA-4779 > Project: Kafka > Issue Type: Bug > Reporter: Apurva Mehta > Assignee: Rajini Sivaram > Fix For: 0.10.3.0, 0.10.2.1 > > > This test failed on 01/29, on both trunk and 0.10.2, error message: > {noformat} > The consumer has terminated, or timed out, on node ubuntu@worker3. > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py", > line 148, in test_rolling_upgrade_phase_two > self.run_produce_consume_validate(self.roll_in_secured_settings, > client_protocol, broker_protocol) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 100, in run_produce_consume_validate > self.stop_producer_and_consumer() > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 87, in stop_producer_and_consumer > self.check_alive() > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 79, in check_alive > raise Exception(msg) > Exception: The consumer has terminated, or timed out, on node ubuntu@worker3. > {noformat} > Looks like the console consumer times out: > {noformat} > [2017-01-30 04:56:00,972] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > kafka.consumer.ConsumerTimeoutException > at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:90) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) > {noformat} > A bunch of these security_rolling_upgrade tests failed, and in all cases, the > producer produced ~15k messages, of which ~7k were acked, and the consumer > only got around ~2600 before timing out. > There are a lot of messages like the following for different request types on > the producer and consumer: > {noformat} > [2017-01-30 05:13:35,954] WARN Received unknown topic or partition error in > produce request on partition test_topic-0. The topic/partition may not exist > or the user may not have Describe access to it > (org.apache.kafka.clients.producer.internals.Sender) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)