Did you see the warning "Error connecting to node" on consumer log?
Best, Lisheng Hrishikesh Mishra <sd.hri...@gmail.com> 于2019年8月29日周四 下午2:45写道: > Please find my reply in blue colour: > > > > On Thu, Aug 29, 2019 at 11:32 AM Lisheng Wang <wanglishen...@gmail.com> > wrote: > > > Hi > > > > about question 1, it's dosen't matter that how many consumers in same > > consumer group. > > > > So you means the broker which is coordinator did not crashed at all > before? > > > > We didn't see any shutdown error on Brokers & we faced similar problem > with multiple coordinators. > > > > > May i know if only exact one broker(coordinator) is unavailable or many > > are? if only exact one, you can try to transfer leader of > _consumer_offset > > which on that broker to another broker to see if it's no problem any > more? > > > > > It happened with multiple consumer groups. > > > > > > i found the following issue seems similar with yours, FYR: > > > > > > > https://stackoverflow.com/questions/51952398/kafka-connect-distributed-mode-the-group-coordinator-is-not-available > > > > We have gone through this link, but in our case it not feasible always to > clean data from offset topic and restart (our cluster size is huge). > > > Best, > > Lisheng > > > > > > Hrishikesh Mishra <sd.hri...@gmail.com> 于2019年8月29日周四 下午12:19写道: > > > > > Hi, > > > > > > We are facing following issues with Kafka cluster. > > > > > > - Kafka Version: 2.0.0 > > > - We following cluster configuration: > > > - Number of Broker: 14 > > > - Per Broker: 37GB Memory and 14 Cores. > > > - Topics: 40 - 50 > > > - Partitions per topic: 32 > > > - Replicas: 3 > > > - Min In Sync Replica: 2 > > > - __consumer_topic partition: 50 > > > - offsets.topic.replication.factor=3 > > > - default.replication.factor=3 > > > - Consumers#: ~4000 (will grow to ~7K) > > > - Consumer Groups#: ~4000 (will grow to ~7K) > > > > > > > > > Imp: Here one consumer is consuming from one topic and one consumer > > group > > > has only one consumer due to some architectural constraints. > > > > > > Two major problems we are facing with consumer group: > > > > > > - First time when we are starting consumer with new group name it > > > working very well. But subsequent restart (with previous / older > group > > > name) is causing problems from some consumers. We are getting > > following > > > errors: > > > > > > INFO [2019-08-28 19:05:34,481] [main] [AbstractCoordinator]: > > [Consumer > > > clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2, > > > groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2] > > > Discovered > > > group coordinator 10.XX.XXX.112:9092 (id: 2147483631 rack: null) > > > INFO [2019-08-28 19:05:34,481] [main] [AbstractCoordinator]: > > [Consumer > > > clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2, > > > groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2] > Group > > > coordinator 10.XX.XXX.112:9092 (id: 2147483631 rack: null) is > > > unavailable > > > or invalid, will attempt rediscovery > > > INFO [2019-08-28 19:05:34,582] [main] [AbstractCoordinator]: > > [Consumer > > > clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2, > > > groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2] > > > Discovered > > > group coordinator 10.32.197.112:9092 (id: 2147483631 rack: null) > > > > > > These messages are keep coming and consumer not able to start / > poll. > > > But if we change the group name then it works first time without any > > > issue > > > (and fails in subsequent restart). So it also means that there is no > > > with > > > issue broker. Will it because of having single consumer in consumer > > > group, > > > if yes then what will be the work around here? > > > > > > - The second error, we are getting when consumer is up and running. > > Then > > > after couple hours, it starts failing and throwing following error: > > > Consumer clientId=banneXXXX#XX-XXX-XXX-XXX-X-1388688-XXX-XXXXX, > > > groupId=bannerXXX#XX-XXX-XXX-XXX-X-1388688-XXX-XXXXX] Offset commit > > > failed > > > on partition banneXXXX-7 at offset 13711176: This is not the correct > > > coordinator > > > [Consumer > > > > > > > > clientId=banXXerGrXXMXX#XX-XX-XXXXX-XXX-5-1478733-XXX-XXXXX-ingestion-v2, > > > > > groupId=banXXerGrXXMXX#XX-XX-XXXXX-XXX-5-1478733-XXX-XXXXX-ingestion-v2] > > > Offset commit failed on partition banXXerGrXXMXX-8 at offset 14741: > > > This is > > > not the correct coordinator. > > > > > > > > > I wanted to know following things: > > > > > > - What is the max limit of consumer groups in a Kafka cluster, I > > didn't > > > find any limitation on internet, all places it mentioned that > limited > > > by OS. > > > - Is there a problem of a consumer group has only one consumer. > > > - Is there some problem with my Kafka configuration, > > > > > > > > > > > > > > > Regards > > > Hrishikesh > > > > > >