[ 
https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249255#comment-14249255
 ] 

Jay Kreps commented on KAFKA-1788:
----------------------------------

Currently the producer supports either blocking or dropping when it cannot send 
to the cluster as fast as data is arriving. This could occur because the 
cluster is down, or just because it isn't fast enough to keep up.

Kafka provides high availability for partitions so the case where a partition 
is permanently unavailable should be rare.

Timing out requests might be nice, but it's not 100% clear that is better than 
the current strategy. The current strategy is just to buffer as long as 
possible and then either block or drop data when the buffer is exhausted. 
Arguably dropping when you are out of space is better than dropping after a 
fixed time (since in any case you have to drop when you are out of space).

As Ewen says we can't reset the metadata because the bootstrap servers may no 
longer exist and if they do they are by definition a subset of the current 
cluster metadata. I think Ewen solution of just making sure leastLoadedNode 
eventually tries all nodes is the right way to go. We'll have to be careful, 
though, as that method is pretty constrained.




> producer record can stay in RecordAccumulator forever if leader is no 
> available
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-1788
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1788
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, producer 
>    Affects Versions: 0.8.2
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>              Labels: newbie++
>             Fix For: 0.8.3
>
>
> In the new producer, when a partition has no leader for a long time (e.g., 
> all replicas are down), the records for that partition will stay in the 
> RecordAccumulator until the leader is available. This may cause the 
> bufferpool to be full and the callback for the produced message to block for 
> a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to