[ https://issues.apache.org/jira/browse/KAFKA-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101044#comment-16101044 ]
Sumant Tambe commented on KAFKA-5621: ------------------------------------- I'm unsure about the suggestion to share the same bucket of retry tokens in accumulator and the network client. I find it comforting that the Sender will retry a few times at the network level before bailing out on a record as not making progress. If the same bucket of tokens is shared, it is possible that network-level retry mechanism does not get enough chances. If/when the broker is overloaded, and takes long time to respond to a produce request, the producer might pessimistically bail out on the record. I may be OK to double the bound on failure notification rather than cutting the network client short in some cases. I'm also not a fan of never expiring a batch (with or without idempotent producer). KMM in Linkedin today stop replication if a batch expires because of expiration (any error for that matter). While KMM does not care (much) about successful replication latency, it does care about how long to wait when there's a logjam of records in it's accumulator. It happens usually in catch-up mode. KMM would want to commit suicide if/when partitions are drained too slowly. I.e., It has a bound on failure notification. > The producer should retry expired batches when retries are enabled > ------------------------------------------------------------------ > > Key: KAFKA-5621 > URL: https://issues.apache.org/jira/browse/KAFKA-5621 > Project: Kafka > Issue Type: Bug > Reporter: Apurva Mehta > Fix For: 1.0.0 > > > Today, when a batch is expired in the accumulator, a {{TimeoutException}} is > raised to the user. > It might be better the producer to retry the expired batch rather up to the > configured number of retries. This is more intuitive from the user's point of > view. > Further the proposed behavior makes it easier for applications like mirror > maker to provide ordering guarantees even when batches expire. Today, they > would resend the expired batch and it would get added to the back of the > queue, causing the output ordering to be different from the input ordering. -- This message was sent by Atlassian JIRA (v6.4.14#64029)