[jira] [Commented] (KAFKA-2805) RecordAccumulator request timeout not enforced when all brokers are gone

Jun Rao (JIRA) Wed, 11 Nov 2015 08:43:00 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000633#comment-15000633
 ]


Jun Rao commented on KAFKA-2805:
--------------------------------

[~mgharat], I am not sure if the current logic works. If the leader is not null 
and is for a broker not connectable, in Sender.run(), the partitions for this 
leader will be ready, but are drainable since the leader is not connectable. 
So, messages in those partitions will never timeout in the current logic. 

In Jason's test, you can get into the above situation when the only broker is 
killed since metadata won't be refreshed after the broker is down. In your 
test, if you provided multiple brokers in broker list, things are a bit 
different. The producer will be able to refresh metadata from other brokers and 
see the leader is gone. In this case, the producer will see a null leader. 
That's probably why you don't see the issue in your test. In both case, the 
effect is pretty much the same---we can't send the partitions' data. So, the 
simplest solution is probably to remove the null check on leader in 
abortExpiredBatches(). If the leader can't be connected for a long period of 
time, the partitions are guaranteed to be drained. If the leader is 
connectable, but the send fails (e.g., due to NotLeader), lastAttemptMs will be 
updated and we will go through the retries. Does that sound reasonable?


> RecordAccumulator request timeout not enforced when all brokers are gone
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-2805
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2805
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Mayuresh Gharat
>
> When no brokers are left in the cluster, the producer seems not to enforce 
> the request timeout as expected.
> From the user mailing list, the null check in batch expiration in 
> RecordAccumulator seems questionable: 
> https://github.com/apache/kafka/blob/ae5a5d7c08bb634576a414f6f2864c5b8a7e58a3/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L220.
>  
> If this is correct behavior, it is probably worthwhile clarifying the purpose 
> of the check in a comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2805) RecordAccumulator request timeout not enforced when all brokers are gone

Reply via email to