I think the most common cause of rebalancing is still GC that exceeds the consumer liveness timeout you've configured. Might be worth enabling GC logging in java and then checking the pause times. If they exceed the timeout you have for liveness then you will detect that as a process failure and rebalance.
-Jay On Sun, Dec 11, 2016 at 11:39 AM, Robert Conrad <rob...@crunchbase.com> wrote: > Hi All, > > I have a relatively complex streaming application that seems to struggle > terribly with rebalance issues while under load. Does anyone have any tips > for investigating what is triggering these frequent rebalances or > particular settings I could experiment with to try to eliminate them? > > Originally I thought it had to do with exceeding the heartbeat timeout with > heavy work threads, but the 0.10.1 release solved that by adding the > background > heartbeat thread > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread>. > Now rebalance just seems to strike randomly and provide no insight into > what triggered it (all nodes are happy, everything seems to be running > smoothly). > > Any help or insight is greatly appreciated! > > Rob >