Rajini Sivaram created KAFKA-3488:
-------------------------------------
Summary: commitAsync() fails if metadata update creates new
SASL/SSL connection
Key: KAFKA-3488
URL: https://issues.apache.org/jira/browse/KAFKA-3488
Project: Kafka
Issue Type: Bug
Components: consumer
Affects Versions: 0.9.0.1
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
Sasl/SslConsumerTest.testSimpleConsumption() fails intermittently with a
failure in {{commitAsync()}}. The exception stack trace shows:
{quote}
kafka.api.SaslPlaintextConsumerTest.testSimpleConsumption FAILED
java.lang.AssertionError: expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at
kafka.api.BaseConsumerTest.awaitCommitCallback(BaseConsumerTest.scala:340)
at
kafka.api.BaseConsumerTest.testSimpleConsumption(BaseConsumerTest.scala:85)
{quote}
I have recreated this with some additional trace. The tests run with a very
small metadata expiry interval, triggering metadata updates quite often. If a
metadata request immediately following a {{commitAsync()}} call creates a new
SSL/SASL connection, {{ConsumerNetworkClient.poll}} returns to process the
connection handshake packets. Since {{ConsumerNetworkClient.poll}} discards all
unsent packets before returning from poll, this can result in the failure of
the commit - the callback is invoked with {{SendFailedException}}.
I understand that {{ConsumerNetworkClient.poll()}} discards unsent packets
rather than buffer them to keep the code simple. And perhaps it is ok to fail
{{commitAsync}} occasionally since the callback does indicate that the caller
should retry. But it feels like an unnecessary limitation that requires error
handling in client applications when there are no real failures and makes it
much harder to test reliably. As special handling to fix issues like
KAFKA-3412, KAFKA-2672 adds more complexity to the code anyway, and because it
is much harder to debug failures that affect only SSL/SASL, it may be worth
considering improving this behaviour.
I will see if I can submit a PR for the specific issue I was seeing with the
impact of handshakes on {{commitAsync()}}, but I will be interested in views on
improving the logic in {{ConsumerNetworkClient}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)