[ https://issues.apache.org/jira/browse/KAFKA-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000633#comment-15000633 ]
Jun Rao commented on KAFKA-2805: -------------------------------- [~mgharat], I am not sure if the current logic works. If the leader is not null and is for a broker not connectable, in Sender.run(), the partitions for this leader will be ready, but are drainable since the leader is not connectable. So, messages in those partitions will never timeout in the current logic. In Jason's test, you can get into the above situation when the only broker is killed since metadata won't be refreshed after the broker is down. In your test, if you provided multiple brokers in broker list, things are a bit different. The producer will be able to refresh metadata from other brokers and see the leader is gone. In this case, the producer will see a null leader. That's probably why you don't see the issue in your test. In both case, the effect is pretty much the same---we can't send the partitions' data. So, the simplest solution is probably to remove the null check on leader in abortExpiredBatches(). If the leader can't be connected for a long period of time, the partitions are guaranteed to be drained. If the leader is connectable, but the send fails (e.g., due to NotLeader), lastAttemptMs will be updated and we will go through the retries. Does that sound reasonable? > RecordAccumulator request timeout not enforced when all brokers are gone > ------------------------------------------------------------------------ > > Key: KAFKA-2805 > URL: https://issues.apache.org/jira/browse/KAFKA-2805 > Project: Kafka > Issue Type: Bug > Reporter: Jason Gustafson > Assignee: Mayuresh Gharat > > When no brokers are left in the cluster, the producer seems not to enforce > the request timeout as expected. > From the user mailing list, the null check in batch expiration in > RecordAccumulator seems questionable: > https://github.com/apache/kafka/blob/ae5a5d7c08bb634576a414f6f2864c5b8a7e58a3/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L220. > > If this is correct behavior, it is probably worthwhile clarifying the purpose > of the check in a comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)