Ask Zuckerberg created KAFKA-9131:
-------------------------------------

             Summary: failed producer metadata updates result in the unrelated 
error message
                 Key: KAFKA-9131
                 URL: https://issues.apache.org/jira/browse/KAFKA-9131
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 2.3.0
            Reporter: Ask Zuckerberg


{{Producer Metadata TimeoutException}} is processed as a generic 
RetriableException in RecordCollectorImpl.sendError. This results in an 
irrelevant error message.

We were supposed to see this

"Timeout exception caught when sending record to topic %s. " +
 "This might happen if the producer cannot send data to the Kafka cluster and 
thus, " +
 "its internal buffer fills up. " +
 "This can also happen if the broker is slow to respond, if the network 
connection to " +
 "the broker was interrupted, or if similar circumstances arise. " +
 "You can increase producer parameter `max.block.ms` to increase this timeout."

but got this:

"You can increase the producer configs `delivery.timeout.ms` and/or " +
 "`retries` to avoid this error. Note that `retries` is set to infinite by 
default."

These params are not applicable to metadata updates.

Technical details:

(1) Lines 221 - 236 in 
kafka/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java
are dead code. They are never executed because {{producer.send}} never throws 
TimeoutException, but returns a failed future. You can see it in lines 948-955 
in 
kafka/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

(2) The exception is then processed in a callback function in the method 
{{recordSendError}} on line 202. The DefaultProductionExceptionHandler is used.

(3) in {{recordSendError}} in the same class the timeout exception is processed 
as RetriableException at lines 133-136. The error message is simply wrong 
because tweaking  {{[delivery.timeout.ms|http://delivery.timeout.ms/]}} and 
{{retries}} has nothing to do with the issue in this case.

Proposed solution:

(1) Remove unreachable catch (final TimeoutException e) in 
RecordCollectorImpl.java as Producer does not throw ApiExceptions.

(2) Move the aforementioned catch clause to recordSendError method.

(3) Process TimeoutException separately from RetiriableException.

(4) Implement a unit test to cover this corner case

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to