Robert,

To validate if a rebalance happens, you can check the server-side logs
starting with "Preparing to restabilize group %s with old generation..",
and if that is triggered by a consumer failure detected, it will have some
entries like "Member XX in group YY has failed" before the "preparing"
line. Note that you need to turn on TRACE level logging in order to do such
fine-grained debugging.

Guozhang


On Mon, Dec 12, 2016 at 9:50 AM, Jay Kreps <j...@confluent.io> wrote:

> I think the most common cause of rebalancing is still GC that exceeds the
> consumer liveness timeout you've configured. Might be worth enabling GC
> logging in java and then checking the pause times. If they exceed the
> timeout you have for liveness then you will detect that as a process
> failure and rebalance.
>
> -Jay
>
> On Sun, Dec 11, 2016 at 11:39 AM, Robert Conrad <rob...@crunchbase.com>
> wrote:
>
> > Hi All,
> >
> > I have a relatively complex streaming application that seems to struggle
> > terribly with rebalance issues while under load. Does anyone have any
> tips
> > for investigating what is triggering these frequent rebalances or
> > particular settings I could experiment with to try to eliminate them?
> >
> > Originally I thought it had to do with exceeding the heartbeat timeout
> with
> > heavy work threads, but the 0.10.1 release solved that by adding the
> > background
> > heartbeat thread
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread>.
> > Now rebalance just seems to strike randomly and provide no insight into
> > what triggered it (all nodes are happy, everything seems to be running
> > smoothly).
> >
> > Any help or insight is greatly appreciated!
> >
> > Rob
> >
>



-- 
-- Guozhang

Reply via email to