[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743246#comment-15743246
 ] 

Ewen Cheslack-Postava commented on KAFKA-4526:
----------------------------------------------

re: related test failures, we're also seeing this in the same test run:

{quote}
====================================================================================================
test_id:    
kafkatest.tests.core.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_SSL.failure_mode=hard_bounce.broker_type=controller
status:     FAIL
run time:   3 minutes 35.081 seconds


    2 acked message did not make it to the Consumer. They are: [43137, 43140]. 
We validated that the first 2 of these missing messages correctly made it into 
Kafka's data files. This suggests they were lost on their way to the 
consumer.(There are also 1110 duplicate messages in the log - but that is an 
acceptable outcome)

Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
    data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
    return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/replication_test.py",
 line 155, in test_replication_with_broker_failure
    self.run_produce_consume_validate(core_test_action=lambda: 
failures[failure_mode](self, broker_type))
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 101, in run_produce_consume_validate
    self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 163, in validate
    assert success, msg
AssertionError: 2 acked message did not make it to the Consumer. They are: 
[43137, 43140]. We validated that the first 2 of these missing messages 
correctly made it into Kafka's data files. This suggests they were lost on 
their way to the consumer.(There are also 1110 duplicate messages in the log - 
but that is an acceptable outcome)
{quote}

These use common utilities, so they may not be related and just have similar 
error messages.  However, the fact that they seem to have started happening at 
the same time is suspicious.

> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---------------------------------------------------------------
>
>                 Key: KAFKA-4526
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4526
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Jason Gustafson
>              Labels: system-test-failure, system-tests
>             Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to