Hi Marcos, I think what you need is static membership which reduces the no.of rebalances required. There is active discussion and work going for this KIP https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances
-Harsha On Jan 24, 2019, 9:51 AM -0800, Marcos Juarez <mjua...@gmail.com>, wrote: > One of our internal customers is working on a service that spans around 120 > kubernetes pods. Due to design constraints, every one of these pods has a > single kafka consumer, and they're all using the same consumer group id. > Since it's kubernetes, and the service is sized according to volume > throughout the day, pods are added/removed constantly, at least a few times > per hour. > > What we are seeing with initial testing is that, whenever a single pod > joins or leaves the consumer group, it triggers a rebalance that sometimes > takes up to 60+ seconds to resolve. Consumption resumes after the > rebalance event, but of course now there's 60+ second lag in consumption > for that topic. Whenever there's a code deploy to these pods, and we need > to re-create all 120 pods, the problem seems to be exacerbated, and we run > into rebalances taking 200+ seconds. This particular service is somewhat > sensitive to lag, so we'd like to keep the rebalance time to a minimum. > > With that context, what kafka configs should we focus on on the consumer > side (and maybe the broker side?) that would enable us to minimize the time > spent on the rebalance? > > Thanks, > > Marcos Juarez