[ https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743246#comment-15743246 ]
Ewen Cheslack-Postava commented on KAFKA-4526: ---------------------------------------------- re: related test failures, we're also seeing this in the same test run: {quote} ==================================================================================================== test_id: kafkatest.tests.core.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_SSL.failure_mode=hard_bounce.broker_type=controller status: FAIL run time: 3 minutes 35.081 seconds 2 acked message did not make it to the Consumer. They are: [43137, 43140]. We validated that the first 2 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer.(There are also 1110 duplicate messages in the log - but that is an acceptable outcome) Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", line 321, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/replication_test.py", line 155, in test_replication_with_broker_failure self.run_produce_consume_validate(core_test_action=lambda: failures[failure_mode](self, broker_type)) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 101, in run_produce_consume_validate self.validate() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 163, in validate assert success, msg AssertionError: 2 acked message did not make it to the Consumer. They are: [43137, 43140]. We validated that the first 2 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer.(There are also 1110 duplicate messages in the log - but that is an acceptable outcome) {quote} These use common utilities, so they may not be related and just have similar error messages. However, the fact that they seem to have started happening at the same time is suspicious. > Transient failure in ThrottlingTest.test_throttled_reassignment > --------------------------------------------------------------- > > Key: KAFKA-4526 > URL: https://issues.apache.org/jira/browse/KAFKA-4526 > Project: Kafka > Issue Type: Bug > Reporter: Ewen Cheslack-Postava > Assignee: Jason Gustafson > Labels: system-test-failure, system-tests > Fix For: 0.10.2.0 > > > This test is seeing transient failures sometimes > {quote} > Module: kafkatest.tests.core.throttling_test > Class: ThrottlingTest > Method: test_throttled_reassignment > Arguments: > { > "bounce_brokers": false > } > {quote} > This happens with both bounce_brokers = true and false. Fails with > {quote} > AssertionError: 1646 acked message did not make it to the Consumer. They are: > 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus > 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the > first 1000 of these missing messages correctly made it into Kafka's data > files. This suggests they were lost on their way to the consumer. > {quote} > See > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html > for an example. > Note that there are a number of similar bug reports for different tests: > https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka > I am wondering if we have a wrong ack setting somewhere that we should be > specifying as acks=all but is only defaulting to 0? > It also seems interesting that the missing messages in these recent failures > seem to always start at 0... -- This message was sent by Atlassian JIRA (v6.3.4#6332)