[ https://issues.apache.org/jira/browse/KAFKA-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102313#comment-16102313 ]
Apurva Mehta commented on KAFKA-5621: ------------------------------------- I think the core dichotomy is that we have mirror-maker-like use cases and application use cases. In the mirror maker use case, each partition is truly independent. If a subset of partitions are down, we still want to process the rest. So we want to expire batches and raise errors to the application (mirror maker in this case) as soon as possible. On the other hand, for an application, partitions are not really independent (and especially so if you use transactions). If one partition is down, it makes sense to wait for it to be ready before continuing. So we would want to handle as many errors internally as possible. It would mean blocking sends once the queue is too large and not expiring batches in the queue. This simplifies the application programming model. I think we should optimize the defaults for applications, but yet enable tools like mirror maker to get the desired behavior by setting the right configs. Assuming that the we complete [KAFKA-5494], we could apply retries to expired batches only when the idempotent producer is enabled. This way the default behavior is the simplest one for the application. KMM and other such tools could continue to use the producer without idempotence enabled and keep the existing behavior. Of course, if we get into the same quandary if KMM wants to enable idempotence, but this is the best compromise without introducing an additional config. Another option is to introduce the 'queue.time.ms' config. The default would be infinite. When it is specified, we would not retry expired batches regardless of whether idempotence is enabled. So KMM like tooling could specify a value and most application developers could ignore it. I am not a fan of introducing new configs for a very narrow use case though, so I will continue to think of more alternatives. > The producer should retry expired batches when retries are enabled > ------------------------------------------------------------------ > > Key: KAFKA-5621 > URL: https://issues.apache.org/jira/browse/KAFKA-5621 > Project: Kafka > Issue Type: Bug > Reporter: Apurva Mehta > Fix For: 1.0.0 > > > Today, when a batch is expired in the accumulator, a {{TimeoutException}} is > raised to the user. > It might be better the producer to retry the expired batch rather up to the > configured number of retries. This is more intuitive from the user's point of > view. > Further the proposed behavior makes it easier for applications like mirror > maker to provide ordering guarantees even when batches expire. Today, they > would resend the expired batch and it would get added to the back of the > queue, causing the output ordering to be different from the input ordering. -- This message was sent by Atlassian JIRA (v6.4.14#64029)