Alexander Bagiev created KAFKA-8623:
---------------------------------------

             Summary: KafkaProducer possible deadlock when sending to different 
topics
                 Key: KAFKA-8623
                 URL: https://issues.apache.org/jira/browse/KAFKA-8623
             Project: Kafka
          Issue Type: Bug
          Components: producer 
    Affects Versions: 2.2.1
            Reporter: Alexander Bagiev


Project with bug reproduction: [https://github.com/abagiev/kafka-producer-bug]

It was found that sending two messages in two different topics in a row results 
in hanging of KafkaProducer for 60s and the following exception:
{noformat}
org.springframework.kafka.core.KafkaProducerException: Failed to send; nested 
exception is org.apache.kafka.common.errors.TimeoutException: Failed to update 
metadata after 60000 ms.
        at 
org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$0(KafkaTemplate.java:405)
 ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE]
        at 
org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:877) 
~[kafka-clients-2.0.1.jar:na]
        at 
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:803) 
~[kafka-clients-2.0.1.jar:na]
        at 
org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:444)
 ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE]
        at 
org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:381) 
~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE]
        at 
org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:193) 
~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE]
...
{noformat}
It looks like KafkaProducer requests two times for meta information for each 
topic and hangs just before second request due to some deadlock. When 60s pass 
TimeoutException is thrown and meta information is requested/received 
immediately (but after exception has been already thrown).

The issue in the example project is reproduced every time; and the use case is 
trivial.
 This is a critical bug for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to