Hi Onur, Thanks for the update. I misunderstood what you said before. I believe what you are suggesting sounds ok, though i don't think it addresses the point Becket made earlier in the discussion thread. See below.
Thanks, Damian ============================================================ 1. Better rebalance timing. We will try to rebalance only when all the consumers in a group have joined. The challenge would be someone has to define what does ALL consumers mean, it could either be a time or number of consumers, etc. 2. Avoid frequent rebalance. For example, if there are 100 consumers in a group, today, in the worst case, we may end up with 100 rebalances even if all the consumers joined the group in a reasonably small amount of time. Frequent rebalance is also a bad thing for brokers. Having a client side configuration may solve problem 1 better because each consumer group can potentially configure their own timing. However, it does not really prevent frequent rebalance in general because some of the consumers can be misconfigured. (This may have something to do with KIP-124 as well. But if quota is applied on the JoinGroup/SyncGroup request it may cause some unwanted cascading effects.) Having a broker side configuration may result in less flexibility for each consumer group, but it can prevent frequent rebalance better. I think with some reasonable design, the rebalance timing issue can be resolved on the broker side as well. Matthias had a good point on extending the delay when a new consumer joins a group (we actually did something similar to batch ISR change propagation). For example, let's say on the broker side, we will always delay 2 seconds each time we see a new consumer joining a consumer group. This would probably work for most of the consumer groups and will also limit the rebalance frequency to protect the brokers. I am not sure about the streams use case here, but if something like 2 seconds of delay is acceptable for streams, I would prefer adding the configuration to the broker so that we can address both problems. On Mon, 3 Apr 2017 at 21:41 Onur Karaman <onurkaraman.apa...@gmail.com> wrote: Delaying the SyncGroupRequest is not what I had in mind. What I was thinking was essentially a client-side stabilization window where the client does nothing other than participate in the group membership protocol and wait a bit for the group to stabilize. During this window, several rounds of rebalance can take place, clients would participate in these rebalances (they'd get notified of the rebalance from the heartbeats they've been sending during this stabilization window), but they would effectively not run any ConsumerRebalanceListener.onPartitionsAssigned or process messages until the window has closed or rebalance finishes if the window ends during a rebalance. So something like: T0: client A is processing messages T1: new client B joins T2: client A gets notified and rejoins the group. T3: rebalance finishes with the group consisting of A and B. This is where the stabilization window begins for both A and B. Stabilization window duration is W. T4: new client C joins. T5: clients A and B get notified and they rejoin the group. T6: rebalance finishes with the group consisting of A, B, and C. T3+W: clients A, B, and C finally run their ConsumerRebalanceListener.onPartitionsAssigned and begin processing messages. If T3+W is during the middle of a rebalance, then we wait until that rebalance round finishes. Otherwise, we just run the ConsumerRebalanceListener.onPartitionsAssigned and begin processing messages. On Mon, Apr 3, 2017 at 11:40 AM, Becket Qin <becket....@gmail.com> wrote: > Hey Onur, > > Are you suggesting letting the consumers to hold back on sending > SyncGroupRequest on the first rebalance? I am not sure how exactly that > works. But it looks that having the group coordinator to control the > rebalance progress would be clearer and probably safer than letting the > group members to guess the state of a group. Can you elaborate a little bit > on your idea? > > Thanks, > > Jiangjie (Becket) Qin > > On Mon, Apr 3, 2017 at 8:16 AM, Onur Karaman <onurkaraman.apa...@gmail.com > > > wrote: > > > Hi Damian. > > > > After reading the discussion thread again, it still doesn't seem like the > > thread discussed the option I mentioned earlier. > > > > From what I had understood from the broker-side vs. client-side config > > debate was that the client-side config from the discussion would cause a > > wire format change, while the client-side config change that I had > > suggested would not. > > > > I just want to make sure we don't accidentally skip past it due to a > > potential misunderstanding. > > > > On Mon, Apr 3, 2017 at 8:10 AM, Bill Bejeck <bbej...@gmail.com> wrote: > > > > > +1 (non-binding) > > > > > > On Mon, Apr 3, 2017 at 9:53 AM, Mathieu Fenniak < > > > mathieu.fenn...@replicon.com> wrote: > > > > > > > +1 (non-binding) > > > > > > > > This will be very helpful for me, looking forward to it! :-) > > > > > > > > On Thu, Mar 30, 2017 at 4:46 AM, Damian Guy <damian....@gmail.com> > > > wrote: > > > > > > > > > Hi All, > > > > > > > > > > I'd like to start the voting thread on KIP-134: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > 134%3A+Delay+initial+consumer+group+rebalance > > > > > > > > > > Thanks, > > > > > Damian > > > > > > > > > > > > > > >