[ 
https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249131#comment-14249131
 ] 

Ewen Cheslack-Postava commented on KAFKA-1788:
----------------------------------------------

[~bpot] that sounds right, I'm pretty sure metadata never gets cleared if all 
brokers become unavailable -- it's only updated when the producer starts and 
when it gets a metadataResponse message.

You can actually get into the state you're talking about for a long time 
without losing all the brokers. Metadata update requests use 
NetworkClient.leastLoadedNode to select which node to send the request to, 
which means requests may repeatedly go to the same node even if its connection 
isn't getting any data through but the TCP connection hasn't timed out yet. 
That can result in waiting for many minutes even though the metadata might be 
retrievable from a different node.

But I'm not sure it's really a distinct problem, just another variant -- the 
batch stays in the RecordAccumulator eating up bufferpool space until there's a 
network error or response to the request that included the batch. This means 
any failure to make progress sending data would trigger the same issue. I think 
a proper fix for this bug would add a timeout for messages as soon as send() is 
called, and would need to be able to remove them from any point in the pipeline 
after that timeout, cleaning up any resources they use.

The metadata issue is another interesting problem. If you reset the metadata, 
the current implementation will block on any subsequent send() calls since the 
first thing send() does is waitOnMetadata(). Arguably, given the interface of 
send() I'm not sure that blocking that way should ever be allowed, although at 
least now its restricted to the initial send() call and probably simplifies a 
bunch of code. Resetting the metadata could also be counterproductive since the 
set of bootstrap nodes could be smaller than the subset of the cluster you had 
metadata for. One alternative idea: change the use of leastLoadedNode and after 
a certain amount of time, allow it to start considering the bootstrap nodes as 
well as the set currently in the metadata.

> producer record can stay in RecordAccumulator forever if leader is no 
> available
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-1788
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1788
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, producer 
>    Affects Versions: 0.8.2
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>              Labels: newbie++
>             Fix For: 0.8.3
>
>
> In the new producer, when a partition has no leader for a long time (e.g., 
> all replicas are down), the records for that partition will stay in the 
> RecordAccumulator until the leader is available. This may cause the 
> bufferpool to be full and the callback for the produced message to block for 
> a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to