Hi Boyang,

Thanks for your reply.
We looked into this direction, but since we didn't change max.poll.interval
from its default value, we're not sure if it's the case.


On Fri, 6 Dec 2019 at 17:42, Boyang Chen <reluctanthero...@gmail.com> wrote:

> Hey Avshalom,
>
> the consumer instance is initiated per stream thread. You will not be
> creating new consumers so the root cause is definitely member timeout.
> Have you changed the max.poll.interval by any chance? That config controls
> how long you tolerate the interval between poll calls to make sure progress
> is being made. If it's very tight, the consumer could
> stop sending heartbeats once progress is slow.
>
> Best,
> Boyang
>
> On Fri, Dec 6, 2019 at 7:12 AM Avshalom Manevich <avshalom...@gmail.com>
> wrote:
>
> > We have a Kafka Streams consumer group that keep moving to
> > PreparingRebalance state and stop consuming. The pattern is as follows:
> >
> >    1.
> >
> >    Consumer group is running and stable for around 20 minutes
> >    2.
> >
> >    New consumers (members) start to appear in the group state without any
> >    clear reason, these new members only originate from a small number of
> > VMs
> >    (not the same VMs each time), and they keep joining
> >    3. Group state changes to PreparingRebalance
> >    4. All consumers stop consuming, showing these logs: "Group
> coordinator
> >    ... is unavailable or invalid, will attempt rediscovery"
> >    5. The consumer on VMs that generated extra members show these logs:
> >
> > Offset commit failed on partition X at offset Y: The coordinator is not
> > aware of this member.
> >
> > Failed to commit stream task X since it got migrated to another thread
> > already. Closing it as zombie before triggering a new rebalance.
> >
> > Detected task Z that got migrated to another thread. This implies that
> this
> > thread missed a rebalance and dropped out of the consumer group. Will try
> > to rejoin the consumer group.
> >
> >
> >    1. We kill all consumer processes on all VMs, the group moves to Empty
> >    with 0 members, we start the processes and we're back to step 1
> >
> > Kafka version is 1.1.0, streams version is 2.0.0
> >
> > We took thread dumps from the misbehaving consumers, and didn't see more
> > consumer threads than configured.
> >
> > We tried restarting kafka brokers, cleaning zookeeper cache.
> >
> > We suspect that the issue has to do with missing heartbeats, but the
> > default heartbeat is 3 seconds and message handling times are no where
> near
> > that.
> >
> > Anyone encountered a similar behaviour?
> >
>


-- 
*Avshalom Manevich*

Reply via email to