[ https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249131#comment-14249131 ]
Ewen Cheslack-Postava commented on KAFKA-1788: ---------------------------------------------- [~bpot] that sounds right, I'm pretty sure metadata never gets cleared if all brokers become unavailable -- it's only updated when the producer starts and when it gets a metadataResponse message. You can actually get into the state you're talking about for a long time without losing all the brokers. Metadata update requests use NetworkClient.leastLoadedNode to select which node to send the request to, which means requests may repeatedly go to the same node even if its connection isn't getting any data through but the TCP connection hasn't timed out yet. That can result in waiting for many minutes even though the metadata might be retrievable from a different node. But I'm not sure it's really a distinct problem, just another variant -- the batch stays in the RecordAccumulator eating up bufferpool space until there's a network error or response to the request that included the batch. This means any failure to make progress sending data would trigger the same issue. I think a proper fix for this bug would add a timeout for messages as soon as send() is called, and would need to be able to remove them from any point in the pipeline after that timeout, cleaning up any resources they use. The metadata issue is another interesting problem. If you reset the metadata, the current implementation will block on any subsequent send() calls since the first thing send() does is waitOnMetadata(). Arguably, given the interface of send() I'm not sure that blocking that way should ever be allowed, although at least now its restricted to the initial send() call and probably simplifies a bunch of code. Resetting the metadata could also be counterproductive since the set of bootstrap nodes could be smaller than the subset of the cluster you had metadata for. One alternative idea: change the use of leastLoadedNode and after a certain amount of time, allow it to start considering the bootstrap nodes as well as the set currently in the metadata. > producer record can stay in RecordAccumulator forever if leader is no > available > ------------------------------------------------------------------------------- > > Key: KAFKA-1788 > URL: https://issues.apache.org/jira/browse/KAFKA-1788 > Project: Kafka > Issue Type: Bug > Components: core, producer > Affects Versions: 0.8.2 > Reporter: Jun Rao > Assignee: Jun Rao > Labels: newbie++ > Fix For: 0.8.3 > > > In the new producer, when a partition has no leader for a long time (e.g., > all replicas are down), the records for that partition will stay in the > RecordAccumulator until the leader is available. This may cause the > bufferpool to be full and the callback for the produced message to block for > a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)