[
https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247766#comment-14247766
]
Bob Potter commented on KAFKA-1788:
-----------------------------------
I've been digging into this a little bit and in addition to an individual
partition being unavailable there is also a case where all brokers become
unavailable and we are unable to refresh metadata. This is distinct case
because the producer still thinks it has a leader for the partition (AFAICT,
the metadata is never updated). The behavior I have seen is that the producer
will continue to accept records for any partition which previously had a leader
but the batches will never exit the accumulator.
It seems like we could track how long it has been since we've been able to
connect to any known brokers and after a certain threshold complete all
outstanding record batches with an error and reset the metadata so that new
production attempts don't end up in the accumulator.
On the other hand, we could just start failing record batches if they have been
in the accumulator for too long. That would solve both failure scenarios.
Although, it seems like we should be resetting the metadata for an unavailable
cluster at some point.
> producer record can stay in RecordAccumulator forever if leader is no
> available
> -------------------------------------------------------------------------------
>
> Key: KAFKA-1788
> URL: https://issues.apache.org/jira/browse/KAFKA-1788
> Project: Kafka
> Issue Type: Bug
> Components: core, producer
> Affects Versions: 0.8.2
> Reporter: Jun Rao
> Assignee: Jun Rao
> Labels: newbie++
> Fix For: 0.8.3
>
>
> In the new producer, when a partition has no leader for a long time (e.g.,
> all replicas are down), the records for that partition will stay in the
> RecordAccumulator until the leader is available. This may cause the
> bufferpool to be full and the callback for the produced message to block for
> a long time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)