[ https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249255#comment-14249255 ]
Jay Kreps commented on KAFKA-1788: ---------------------------------- Currently the producer supports either blocking or dropping when it cannot send to the cluster as fast as data is arriving. This could occur because the cluster is down, or just because it isn't fast enough to keep up. Kafka provides high availability for partitions so the case where a partition is permanently unavailable should be rare. Timing out requests might be nice, but it's not 100% clear that is better than the current strategy. The current strategy is just to buffer as long as possible and then either block or drop data when the buffer is exhausted. Arguably dropping when you are out of space is better than dropping after a fixed time (since in any case you have to drop when you are out of space). As Ewen says we can't reset the metadata because the bootstrap servers may no longer exist and if they do they are by definition a subset of the current cluster metadata. I think Ewen solution of just making sure leastLoadedNode eventually tries all nodes is the right way to go. We'll have to be careful, though, as that method is pretty constrained. > producer record can stay in RecordAccumulator forever if leader is no > available > ------------------------------------------------------------------------------- > > Key: KAFKA-1788 > URL: https://issues.apache.org/jira/browse/KAFKA-1788 > Project: Kafka > Issue Type: Bug > Components: core, producer > Affects Versions: 0.8.2 > Reporter: Jun Rao > Assignee: Jun Rao > Labels: newbie++ > Fix For: 0.8.3 > > > In the new producer, when a partition has no leader for a long time (e.g., > all replicas are down), the records for that partition will stay in the > RecordAccumulator until the leader is available. This may cause the > bufferpool to be full and the callback for the produced message to block for > a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)