[ https://issues.apache.org/jira/browse/KAFKA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax reassigned KAFKA-9131: -------------------------------------- Assignee: Gleb Komissarov > failed producer metadata updates result in the unrelated error message > ---------------------------------------------------------------------- > > Key: KAFKA-9131 > URL: https://issues.apache.org/jira/browse/KAFKA-9131 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.3.0 > Reporter: Gleb Komissarov > Assignee: Gleb Komissarov > Priority: Major > > {{Producer Metadata TimeoutException}} is processed as a generic > RetriableException in RecordCollectorImpl.sendError. This results in an > irrelevant error message. > We were supposed to see this > "Timeout exception caught when sending record to topic %s. " + > "This might happen if the producer cannot send data to the Kafka cluster and > thus, " + > "its internal buffer fills up. " + > "This can also happen if the broker is slow to respond, if the network > connection to " + > "the broker was interrupted, or if similar circumstances arise. " + > "You can increase producer parameter `max.block.ms` to increase this > timeout." > but got this: > "You can increase the producer configs `delivery.timeout.ms` and/or " + > "`retries` to avoid this error. Note that `retries` is set to infinite by > default." > These params are not applicable to metadata updates. > Technical details: > (1) Lines 221 - 236 in > kafka/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java > are dead code. They are never executed because {{producer.send}} never throws > TimeoutException, but returns a failed future. You can see it in lines > 948-955 in > kafka/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java > (2) The exception is then processed in a callback function in the method > {{recordSendError}} on line 202. The DefaultProductionExceptionHandler is > used. > (3) in {{recordSendError}} in the same class the timeout exception is > processed as RetriableException at lines 133-136. The error message is simply > wrong because tweaking {{[delivery.timeout.ms|http://delivery.timeout.ms/]}} > and {{retries}} has nothing to do with the issue in this case. > Proposed solution: > (1) Remove unreachable catch (final TimeoutException e) in > RecordCollectorImpl.java as Producer does not throw ApiExceptions. > (2) Move the aforementioned catch clause to recordSendError method. > (3) Process TimeoutException separately from RetiriableException. > (4) Implement a unit test to cover this corner case > > -- This message was sent by Atlassian Jira (v8.3.4#803005)