Hello Kafka Experts, We are currently facing issue on our 3 node Kafka Cluster , one of the broker is not responding to any queries. I've checked logs but founding nothing related to this problem.
Kafka Version: 1.1.0 Server.conf: ## Timeout properties to check prod outage replica.fetch.wait.max.ms = 1000 replica.socket.timeout.ms = 60000 request.timeout.ms = 60000 zookeeper.connection.timeout.ms = 10000 zookeeper.session.timeout.ms = 10000 controller.socket.timeout.ms = 60000 group.max.session.timeout.ms = 400000 group.min.session.timeout.ms = 7000 ## socket.send.buffer.bytes = 102400 reserved.broker.max.id = 2147483647 num.partitions = 30 ssl.secure.random.implementation = SHA1PRNG ssl.key.password = ******* log.cleaner.delete.retention.ms = 180000 log.retention.ms = 600000 listeners = SSL://IP1:9093 broker.id = 1 socket.receive.buffer.bytes = 102400 message.max.bytes = 67108864 ssl.truststore.password = ******* ssl.enabled.protocols = TLSv1.2 auto.create.topics.enable = true log.roll.ms = 3600000 auto.leader.rebalance.enable = true ssl.keystore.location = /tmp/keyStore.jks zookeeper.connect = IP1:2181/kafka.cluster1 log.retention.check.interval.ms = 300000 replica.fetch.max.bytes = 67108864 socket.request.max.bytes = 104857600 default.replication.factor = 2 offsets.topic.replication.factor = 2 log.dirs = /kafka_logs/,/kafka_logs/ ssl.keystore.password = **** min.insync.replicas = 2 security.inter.broker.protocol = SSL compression.codec = 3 ssl.truststore.location = /tmp/trustStore.jks *ERROR [Consumer clientId=consumer-1, groupId=console-consumer-21688] Offset commit failed on partition logs-28 at offset 0: The request timed out. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)* Partition : logs-28 is actually resides on the broker which is not responding... If i restart the broker it will start responding but would like to figure exact cause on why out of 3 brokers one broker is failing . WARN [ReplicaFetcher replicaId=72804, leaderId=72802, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=72804, maxWait=1000, minBytes=1, maxBytes=10485760, fetchData={logs-22=(offset=0, tartOffset=0, maxBytes=67108864), logs-23=(offset=48199929, tartOffset=48199929, maxBytes=67108864), logs-5=(offset=0, tartOffset=0, maxBytes=67108864), logs-17=(offset=48225630, tartOffset=48225630, maxBytes=67108864), logs-24=(offset=48913184, tartOffset=48913184, maxBytes=67108864), logs-5=(offset=48256727, tartOffset=48256727, maxBytes=67108864), logs-4=(offset=0, tartOffset=0, maxBytes=67108864), logs-17=(offset=0, tartOffset=0, maxBytes=67108864), logs-6=(offset=48916295, tartOffset=48916295, maxBytes=67108864), logs-17=(offset=48210909, tartOffset=48210909, maxBytes=67108864), logs-29=(offset=48193290, tartOffset=48193290, maxBytes=67108864), logs-16=(offset=0, tartOffset=0, maxBytes=67108864), __consumer_offsets-11=(offset=499512, tartOffset=0, maxBytes=67108864), __consumer_offsets-41=(offset=0, tartOffset=0, maxBytes=67108864), logs-11=(offset=50644515, tartOffset=50644515, maxBytes=67108864), logs-18=(offset=48881988, tartOffset=48881988, maxBytes=67108864), logs-28=(offset=0, tartOffset=0, maxBytes=67108864), logs-29=(offset=50603359, tartOffset=50603359, maxBytes=67108864), logs-0=(offset=51682300, tartOffset=51682300, maxBytes=67108864), logs-11=(offset=50638782, tartOffset=50638782, maxBytes=67108864), logs-12=(offset=51674247, tartOffset=51674247, maxBytes=67108864), logs-23=(offset=50625272, tartOffset=50625272, maxBytes=67108864), __consumer_offsets-5=(offset=0, tartOffset=0, maxBytes=67108864), __consumer_offsets-35=(offset=0, tartOffset=0, maxBytes=67108864), logs-5=(offset=50676068, tartOffset=50676068, maxBytes=67108864), logs-23=(offset=0, tartOffset=0, maxBytes=67108864)}, isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=26393661, epoch=INITIAL)) (kafka.server.ReplicaFetcherThread) *java.net.SocketTimeoutException: Failed to connect within 60000 ms* Pls advise here. --Senthil