Hi Boyang, Thanks for your reply. We looked into this direction, but since we didn't change max.poll.interval from its default value, we're not sure if it's the case.
On Fri, 6 Dec 2019 at 17:42, Boyang Chen <reluctanthero...@gmail.com> wrote: > Hey Avshalom, > > the consumer instance is initiated per stream thread. You will not be > creating new consumers so the root cause is definitely member timeout. > Have you changed the max.poll.interval by any chance? That config controls > how long you tolerate the interval between poll calls to make sure progress > is being made. If it's very tight, the consumer could > stop sending heartbeats once progress is slow. > > Best, > Boyang > > On Fri, Dec 6, 2019 at 7:12 AM Avshalom Manevich <avshalom...@gmail.com> > wrote: > > > We have a Kafka Streams consumer group that keep moving to > > PreparingRebalance state and stop consuming. The pattern is as follows: > > > > 1. > > > > Consumer group is running and stable for around 20 minutes > > 2. > > > > New consumers (members) start to appear in the group state without any > > clear reason, these new members only originate from a small number of > > VMs > > (not the same VMs each time), and they keep joining > > 3. Group state changes to PreparingRebalance > > 4. All consumers stop consuming, showing these logs: "Group > coordinator > > ... is unavailable or invalid, will attempt rediscovery" > > 5. The consumer on VMs that generated extra members show these logs: > > > > Offset commit failed on partition X at offset Y: The coordinator is not > > aware of this member. > > > > Failed to commit stream task X since it got migrated to another thread > > already. Closing it as zombie before triggering a new rebalance. > > > > Detected task Z that got migrated to another thread. This implies that > this > > thread missed a rebalance and dropped out of the consumer group. Will try > > to rejoin the consumer group. > > > > > > 1. We kill all consumer processes on all VMs, the group moves to Empty > > with 0 members, we start the processes and we're back to step 1 > > > > Kafka version is 1.1.0, streams version is 2.0.0 > > > > We took thread dumps from the misbehaving consumers, and didn't see more > > consumer threads than configured. > > > > We tried restarting kafka brokers, cleaning zookeeper cache. > > > > We suspect that the issue has to do with missing heartbeats, but the > > default heartbeat is 3 seconds and message handling times are no where > near > > that. > > > > Anyone encountered a similar behaviour? > > > -- *Avshalom Manevich*