Hi, As Artem mentioned, I did some tests with setting replication factor 1 and 3 for two different topics
One of the kafka broker is down: The command works if the replication factor is 3. (*testtopicreplica3 is created with rf 3)* *[root@node-223 kafka_2.12-2.8.2]# ./bin/kafka-consumer-groups.sh --bootstrap-server 192.168.20.223:9092 <http://192.168.20.223:9092>,192.168.20.224:9092 <http://192.168.20.224:9092>,192.168.20.225:9092 <http://192.168.20.225:9092> --group grp3partition --describeGROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-IDgrp3partition testtopicreplica3 1 0 0 0 consumer-grp3partition-1-c97a90e3-9c7d-4ad6-958a-229d6995a5e7 /192.168.20.225 <http://192.168.20.225> consumer-grp3partition-1grp3partition testtopicreplica3 2 0 0 0 consumer-grp3partition-1-e0ea81e9-702d-4cae-9b3c-75a13e7f42c8 /192.168.20.223 <http://192.168.20.223> consumer-grp3partition-1grp3partition testtopicreplica3 0 0 0 0 consumer-grp3partition-1-7e303536-6f26-49fe-8436-07ed7ddc303a /192.168.20.224 <http://192.168.20.224> consumer-grp3partition-1 * It fails if the replication factor is 1: *[root@node-223 kafka_2.12-2.8.2]# ./bin/kafka-consumer-groups.sh --bootstrap-server 192.168.20.223:9092 <http://192.168.20.223:9092>,192.168.20.224:9092 <http://192.168.20.224:9092>,192.168.20.225:9092 <http://192.168.20.225:9092> --group grp1 --describeError: Executing consumer group command failed due to org.apache.kafka.common.errors.TimeoutException: Call(callName=metadata, deadlineMs=1705910117087, tries=50, nextAllowedTryMs=1705910117188) timed out at 1705910117088 after 50 attempt(s)java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=metadata, deadlineMs=1705910117087, tries=50, nextAllowedTryMs=1705910117188) timed out at 1705910117088 after 50 attempt(s) at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.getLogEndOffsets(ConsumerGroupCommand.scala:646) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectConsumerAssignment(ConsumerGroupCommand.scala:412) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$5(ConsumerGroupCommand.scala:581) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$5$adapted(ConsumerGroupCommand.scala:572) at scala.collection.immutable.List.flatMap(List.scala:366) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$2(ConsumerGroupCommand.scala:572) at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:935) at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:934) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectGroupsOffsets(ConsumerGroupCommand.scala:567) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeGroups(ConsumerGroupCommand.scala:368) at kafka.admin.ConsumerGroupCommand$.run(ConsumerGroupCommand.scala:73) at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:60) at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=metadata, deadlineMs=1705910117087, tries=50, nextAllowedTryMs=1705910117188) timed out at 1705910117088 after 50 attempt(s)Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: metadata* If we look from the point of view of the Java client that processes data, will the client processing a topic with a replication factor 1, becomes unable to process data, after a single Kafka in the cluster stops? regards, On Sat, Jan 20, 2024 at 1:50 PM Artem Timchenko <artem.timche...@bolt.eu.invalid> wrote: > Hi, > > Just a long shoot, but I might be wrong. You have > offsets.topic.replication.factor=1 in you config, when one broker is down, > some partitions of __consumer_offsets topic will be down either. So > kafka-consumer-groups can't get offsets from it. Maybe it's just a little > misleading error message. > > > > On Sat, Jan 20, 2024 at 11:38 AM Yavuz Sert <yavuz.s...@netsia.com> wrote: > > > Hi, sorry for the confusion, here is details: > > > > I have 3 broker nodes: 192.168.20.223 / 224 / 225 > > > > When all kafka services are UP: > > > > [image: image.png] > > I stopped the kafka service on *node 225*: > > > > [image: image.png] > > Then i tried the command on node223 with --bootstrap-server > > 192.168.20.223:9092,192.168.20.224:9092,192.168.20.225:9092: > > > > [image: image.png] > > > > > > > > *Caused by: org.apache.kafka.common.errors.TimeoutException: > > Call(callName=findCoordinator, deadlineMs=1705743236910, tries=47, > > nextAllowedTryMs=1705743237011) timed out at 1705743236911 after 47 > > attempt(s)Caused by: org.apache.kafka.common.errors.TimeoutException: > Timed > > out waiting for a node assignment. Call: findCoordinator* > > > > even after minutes, i got same error. > > > > Thats my problem. > > > > br, > > > > yavuz > > > > > > On Sat, Jan 20, 2024 at 4:11 AM Haruki Okada <ocadar...@gmail.com> > wrote: > > > >> Hi. > >> > >> Which server did you shutdown in testing? > >> If it was 192.168.20.223, that is natural kafka-consumer-groups script > >> fails because you passed only 192.168.20.223 to the bootstrap-server > arg. > >> > >> In HA setup, you have to pass multiple brokers (as the comma separated > >> string) to bootstrap-server so that the client can fetch initial > metadata > >> from other servers even when one fails. > >> > >> 2024年1月20日(土) 0:30 Yavuz Sert <yavuz.s...@netsia.com>: > >> > >> > Hi all, > >> > > >> > I'm trying to do some tests about high availability on kafka v2.8.2 > >> > I have 3 kafka brokers and 3 zookeeper instances. > >> > when i shutdown one of the kafka service only in one server i got this > >> > error: > >> > > >> > [root@node-223 ~]# > /root/kafka_2.12-2.8.2/bin/kafka-consumer-groups.sh > >> > --bootstrap-server 192.168.20.223:9092 --group app2 --describe > >> > > >> > Error: Executing consumer group command failed due to > >> > org.apache.kafka.common.errors.TimeoutException: > >> > Call(callName=findCoordinator, deadlineMs=1705677946526, tries=47, > >> > nextAllowedTryMs=1705677946627) timed out at 1705677946527 after 47 > >> > attempt(s) > >> > java.util.concurrent.ExecutionException: > >> > org.apache.kafka.common.errors.TimeoutException: > >> > Call(callName=findCoordinator, deadlineMs=1705677946526, tries=47, > >> > nextAllowedTryMs=1705677946627) timed out at 1705677946527 after 47 > >> > attempt(s) > >> > at > >> > > >> > > >> > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > >> > at > >> > > >> > > >> > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > >> > at > >> > > >> > > >> > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > >> > at > >> > > >> > > >> > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) > >> > at > >> > > >> > > >> > kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$describeConsumerGroups$1(ConsumerGroupCommand.scala:550) > >> > at > >> > > >> > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > >> > at scala.collection.Iterator.foreach(Iterator.scala:943) > >> > at scala.collection.Iterator.foreach$(Iterator.scala:943) > >> > at > >> scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > >> > at > scala.collection.IterableLike.foreach(IterableLike.scala:74) > >> > at > scala.collection.IterableLike.foreach$(IterableLike.scala:73) > >> > at > scala.collection.AbstractIterable.foreach(Iterable.scala:56) > >> > at > >> scala.collection.TraversableLike.map(TraversableLike.scala:286) > >> > at > >> scala.collection.TraversableLike.map$(TraversableLike.scala:279) > >> > at > >> scala.collection.AbstractTraversable.map(Traversable.scala:108) > >> > at > >> > > >> > > >> > kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeConsumerGroups(ConsumerGroupCommand.scala:549) > >> > at > >> > > >> > > >> > kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectGroupsOffsets(ConsumerGroupCommand.scala:565) > >> > at > >> > > >> > > >> > kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeGroups(ConsumerGroupCommand.scala:368) > >> > at > >> > kafka.admin.ConsumerGroupCommand$.run(ConsumerGroupCommand.scala:73) > >> > at > >> > kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:60) > >> > at > >> > kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala) > >> > Caused by: org.apache.kafka.common.errors.TimeoutException: > >> > Call(callName=findCoordinator, deadlineMs=1705677946526, tries=47, > >> > nextAllowedTryMs=1705677946627) timed out at 1705677946527 after 47 > >> > attempt(s) > >> > *Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out > >> > waiting for a node assignment. Call: findCoordinator* > >> > > >> > kafka conf (for 1 server) > >> > broker.id=0 > >> > listeners=PLAINTEXT://0.0.0.0:9092 > >> > advertised.listeners=PLAINTEXT://192.168.20.223:9092 > >> > num.network.threads=3 > >> > num.io.threads=8 > >> > socket.send.buffer.bytes=102400 > >> > socket.receive.buffer.bytes=102400 > >> > socket.request.max.bytes=104857600 > >> > log.dirs=/root/kafkadir > >> > num.partitions=1 > >> > num.recovery.threads.per.data.dir=1 > >> > offsets.topic.replication.factor=1 > >> > transaction.state.log.replication.factor=1 > >> > transaction.state.log.min.isr=1 > >> > log.retention.hours=1 > >> > log.segment.bytes=104857600 > >> > log.retention.check.interval.ms=300000 > >> > delete.topic.enable=true > >> > zookeeper.connection.timeout.ms=18000 > >> > zookeeper.connect=192.168.20.223:2181,192.168.20.224:2181, > >> > 192.168.20.225:2181 > >> > group.initial.rebalance.delay.ms=0 > >> > max.request.size=104857600 > >> > message.max.bytes=104857600 > >> > > >> > How can i fix or troubleshoot the error? > >> > > >> > Thanks > >> > > >> > Yavuz > >> > > >> > >> > >> -- > >> ======================== > >> Okada Haruki > >> ocadar...@gmail.com > >> ======================== > >> > > >