radha created KAFKA-4939: ---------------------------- Summary: Kafka does not log NoRouteToHostException in ERROR log level Key: KAFKA-4939 URL: https://issues.apache.org/jira/browse/KAFKA-4939 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.10.1.1 Reporter: radha Priority: Minor
If you have many brokers and some cannot be reached by a specific Kafka client for whatever reason, (cannot ping), it does not log this as ERROR and fails publishing with other errors that can never be resolved. ERROR pool-3-thread-3 [ProducerDroppedMessageExceptionLogger ] - Exception occured while producing message: Failed to update metadata after 1000 ms. org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 1000 ms. ERROR kafka-producer-network-thread | producer-1 [ProducerDroppedMessageExceptionLogger ] - Exception occured while producing message: Expiring 1 record(s) for Q.REST.TOPIC-18 due to 5048 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for Q.REST.TOPIC-18 due to 5048 ms has passed since batch creation plus linger time You will see connections established to Kafka when doing netstat, even though these messages fail to be published. Logs that should be in ERROR and also retried. We have wasted several hours before increasing log levels to TRACE and seeing these and confirming that we cannot even ping that specific Kafka Broker. [org.apache.kafka.common.network.Selector] - Connection with some-prd-kafk02/*.*.*.* disconnected java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:51) at org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:73) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323) at org.apache.kafka.common.network.Selector.poll(Selector.java:291) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) at java.lang.Thread.run(Thread.java:745) [org.apache.kafka.clients.NetworkClient ] - Node 206 disconnected. -- This message was sent by Atlassian JIRA (v6.3.15#6346)