[ 
https://issues.apache.org/jira/browse/KAFKA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-9131:
--------------------------------------

    Assignee: Gleb Komissarov

> failed producer metadata updates result in the unrelated error message
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-9131
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9131
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.3.0
>            Reporter: Gleb Komissarov
>            Assignee: Gleb Komissarov
>            Priority: Major
>
> {{Producer Metadata TimeoutException}} is processed as a generic 
> RetriableException in RecordCollectorImpl.sendError. This results in an 
> irrelevant error message.
> We were supposed to see this
> "Timeout exception caught when sending record to topic %s. " +
>  "This might happen if the producer cannot send data to the Kafka cluster and 
> thus, " +
>  "its internal buffer fills up. " +
>  "This can also happen if the broker is slow to respond, if the network 
> connection to " +
>  "the broker was interrupted, or if similar circumstances arise. " +
>  "You can increase producer parameter `max.block.ms` to increase this 
> timeout."
> but got this:
> "You can increase the producer configs `delivery.timeout.ms` and/or " +
>  "`retries` to avoid this error. Note that `retries` is set to infinite by 
> default."
> These params are not applicable to metadata updates.
> Technical details:
> (1) Lines 221 - 236 in 
> kafka/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java
> are dead code. They are never executed because {{producer.send}} never throws 
> TimeoutException, but returns a failed future. You can see it in lines 
> 948-955 in 
> kafka/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java
> (2) The exception is then processed in a callback function in the method 
> {{recordSendError}} on line 202. The DefaultProductionExceptionHandler is 
> used.
> (3) in {{recordSendError}} in the same class the timeout exception is 
> processed as RetriableException at lines 133-136. The error message is simply 
> wrong because tweaking  {{[delivery.timeout.ms|http://delivery.timeout.ms/]}} 
> and {{retries}} has nothing to do with the issue in this case.
> Proposed solution:
> (1) Remove unreachable catch (final TimeoutException e) in 
> RecordCollectorImpl.java as Producer does not throw ApiExceptions.
> (2) Move the aforementioned catch clause to recordSendError method.
> (3) Process TimeoutException separately from RetiriableException.
> (4) Implement a unit test to cover this corner case
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to