It seems that one of the brokers somehow had a high CPU utilization, like 5 of the brokers had 15%, and one had 100% utilization. After I added more CPUs to that broker with 100% CPUs utilization, the issue solved itself.
Peter On Thu, 20 Feb 2020 at 10:54, Péter Sinóros-Szabó < peter.sinoros-sz...@transferwise.com> wrote: > Hi, > > we use Kafka 1.1.1, recently I faced with an issue/bug I can't see how to > solve. > We have a service running two instances of it, using the same consumer > group id to access some topics. When the service starts and it starts to > join the consumer group, the join does not succeed. > > The application get error messages like: > > Accepting Kafka message from topic 'myTopic', partition 0, offset 383554 > failed. > Attempt to heartbeat failed since group is rebalancing > > > On the broker, I see: > ./kafka-consumer-groups.sh ... --group self-service --describe --state > COORDINATOR (ID) ASSIGNMENT-STRATEGY STATE > #MEMBERS > 172.3.xx.yy:9092 (1006) PreparingRebalance 1 > > And it stucks there. > > In the server logs I see the same logs repeating continuously: > [2020-02-20 09:49:32,395] INFO [GroupCoordinator 1006]: Stabilized group > self-service generation 192346 (__consumer_offsets-32) > (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:32,396] INFO [GroupCoordinator 1006]: Assignment > received from leader for group self-service for generation 192346 > (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:32,406] INFO [GroupCoordinator 1006]: Preparing to > rebalance group self-service with old generation 192346 > (__consumer_offsets-32) (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:32,406] INFO [GroupCoordinator 1006]: Group self-service > with generation 192347 is now empty (__consumer_offsets-32) > (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:33,722] INFO [GroupCoordinator 1006]: Preparing to > rebalance group self-service with old generation 192347 > (__consumer_offsets-32) (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:36,723] INFO [GroupCoordinator 1006]: Stabilized group > self-service generation 192348 (__consumer_offsets-32) > (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:36,724] INFO [GroupCoordinator 1006]: Assignment > received from leader for group self-service for generation 192348 > (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:36,734] INFO [GroupCoordinator 1006]: Preparing to > rebalance group self-service with old generation 192348 > (__consumer_offsets-32) (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:36,734] INFO [GroupCoordinator 1006]: Group self-service > with generation 192349 is now empty (__consumer_offsets-32) > (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:37,419] INFO [GroupCoordinator 1006]: Preparing to > rebalance group self-service with old generation 192349 > (__consumer_offsets-32) (kafka.coordinator.group.GroupCoordinator) > [2020-02-20 09:49:40,419] INFO [GroupCoordinator 1006]: Stabilized group > self-service generation 192350 (__consumer_offsets-32) > (kafka.coordinator.group.GroupCoordinator) > > What should I do to fix it? I tried restarting all brokers, the service > several times, but it always end up in this state. > The same setup works fine in another environment just fine. > > Thanks, > Peter >