Hi, We are facing an issue where we are seeing high producer send error rates when one of the nodes in the cluster is down for maintenance. We see a lot of exceptions related to java nio libraries server when this happens. Any idea what could be causing this? We use min.isr=2 and use at least once delivery semantics. Moreover, we have one extra node in the cluster so one node going down should not have any effect on the cluster.
KafkaProducerConfigs acks=all retries=5 request.timeout.ms=10000 linger.ms=500 batch.size=32768 buffer.memory=67108864 Rest of the settings are default *Errors we are seeing on the producer clients are as follows* 2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx] o.a.k.c.p.i.Sender [Producer clientId=xxxxxxx] Got error produce response with correlation id 9224633 on topic-partition device_telemetry-29, retrying (4 attempts left). Error: NETWORK_EXCEPTION 2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx] o.a.k.c.p.i.Sender [Producer clientId=xxxxxxx] Received invalid metadata error in produce request on partition device_telemetry-29 due to org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.. Going to request metadata update now 2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx] o.a.k.c.p.i.Sender [Producer clientId=networking-monitoring-01002.node.ad1.r2] Got error produce response with correlation id 9224633 on topic-partition device_telemetry-1, retrying (4 attempts left). Error: NETWORK_EXCEPTION 2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx] o.a.k.c.p.i.Sender [Producer clientId=xxxxxxx] Received invalid metadata error in produce request on partition device_telemetry-1 due to org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.. Going to request metadata update now 2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx] o.a.k.c.p.i.Sender [Producer clientId=xxxxxxx] Got error produce response with correlation id 9224633 on topic-partition device_telemetry-33, retrying (4 attempts left). Error: NETWORK_EXCEPTION 2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx] o.a.k.c.p.i.Sender [Producer clientId=xxxxxxx] Received invalid metadata error in produce request on partition device_telemetry-33 due to org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.. Going to request metadata update now 2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx] o.a.k.c.p.i.Sender [Producer clientId=xxxxxxx] Got error produce response with correlation id 9224633 on topic-partition device_telemetry-30, retrying (4 attempts left). Error: NETWORK_EXCEPTION Errors seen on the Kafka Server [2021-03-19 16:12:53,075] INFO [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=2] Error sending fetch request (sessionId=1651726962, epoch=293641949) to node 2: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:30:06,629] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=3] Error sending fetch request (sessionId=527382701, epoch=5538701) to node 3: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:30:06,934] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=2] Error sending fetch request (sessionId=745744629, epoch=409235096) to node 3: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:30:06,935] INFO [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=2] Error sending fetch request (sessionId=270968958, epoch=4871069) to node 4: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:30:06,937] INFO [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=0] Error sending fetch request (sessionId=248799819, epoch=208504323) to node 4: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:30:06,938] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=1] Error sending fetch request (sessionId=624148312, epoch=212419334) to node 3: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:30:06,940] INFO [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=3] Error sending fetch request (sessionId=289201163, epoch=1264570088) to node 4: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:30:06,941] INFO [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=1] Error sending fetch request (sessionId=2006778606, epoch=412437276) to node 4: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:30:06,942] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Error sending fetch request (sessionId=192606775, epoch=1246960140) to node 3: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:48:19,988] INFO [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=2022443912, epoch=25872) to node 2: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) [2021-03-19 16:48:19,990] INFO [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=2] Error sending fetch request (sessionId=1499198229, epoch=110928) to node 2: java.nio.channels.ClosedSelectorException. (org.apache.kafka.clients.FetchSessionHandler) -- *Regards,* *Dhruv*