[ https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247766#comment-14247766 ]
Bob Potter commented on KAFKA-1788: ----------------------------------- I've been digging into this a little bit and in addition to an individual partition being unavailable there is also a case where all brokers become unavailable and we are unable to refresh metadata. This is distinct case because the producer still thinks it has a leader for the partition (AFAICT, the metadata is never updated). The behavior I have seen is that the producer will continue to accept records for any partition which previously had a leader but the batches will never exit the accumulator. It seems like we could track how long it has been since we've been able to connect to any known brokers and after a certain threshold complete all outstanding record batches with an error and reset the metadata so that new production attempts don't end up in the accumulator. On the other hand, we could just start failing record batches if they have been in the accumulator for too long. That would solve both failure scenarios. Although, it seems like we should be resetting the metadata for an unavailable cluster at some point. > producer record can stay in RecordAccumulator forever if leader is no > available > ------------------------------------------------------------------------------- > > Key: KAFKA-1788 > URL: https://issues.apache.org/jira/browse/KAFKA-1788 > Project: Kafka > Issue Type: Bug > Components: core, producer > Affects Versions: 0.8.2 > Reporter: Jun Rao > Assignee: Jun Rao > Labels: newbie++ > Fix For: 0.8.3 > > > In the new producer, when a partition has no leader for a long time (e.g., > all replicas are down), the records for that partition will stay in the > RecordAccumulator until the leader is available. This may cause the > bufferpool to be full and the callback for the produced message to block for > a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)