I feel we can change `AwaitSync` to `completeRebalance` while keeping the other as is.
cc Jason? Guozhang On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe <cmcc...@apache.org> wrote: > Thanks for the explanation. I guess maybe we should just keep the group > names as they are, then? > > best, > Colin > > > On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote: > > To me `PreparingRebalance` sounds better than `StartingRebalance` since > > only by the end of that stage we have formed a new group. More > > specifically, this this the workflow from the coordinator's point of > > view: > > > > 1. decided to trigger a rebalance, enter PreparingRebalance phase; > > | > > | send out error code for all heartbeat reponses > > \|/ > > | > > | waiting for join group requests from members > > \|/ > > 2. formed a new group, increment the generation number, now start > > rebalancing, entering AwaitSync phase: > > | > > | send out the join group responses for whoever > > requested join > > \|/ > > | > > | waiting for the sync group request from the leader > > \|/ > > 3. received assignment from the leader; the rebalance has ended, start > > ticking for all members, entering Stable phase. > > | > > | for whoever else sending the sync group request, > > reply with the assignment > > \|/ > > > > So from the coordinator's point of view the rebalance starts at beginning > > of step 2 and ends at beginning of step 3. Maybe we can rename > > `AwaitSync` > > itself to `CompletingRebalance`. > > > > Guozhang > > > > > > > > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma <ism...@juma.me.uk> wrote: > > > > > Hi Guozhang, > > > > > > Thanks for the clarification. The naming does seem a bit unclear. Maybe > > > `PreparingRebalance` could be `StartingRebalance` or something that > makes > > > it clear that it is part of the rebalance instead of a step before the > > > actual rebalance. `AwaitingSync` could also be `CompletingRebalance`, > but > > > not sure if that's better. > > > > > > Ismael > > > > > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > > > Actually Rebalancing includes two steps, and we name them > > > PrepareRebalance > > > > and WaitSync (arguably they may not be the best names). But these two > > > steps > > > > together should be treated as the complete rebalance cycle. > > > > > > > > > > > > Guozhang > > > > > > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe <cmcc...@apache.org> > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I think maybe it makes sense to rename the "PreparingRebalance" > > > consumer > > > > > group state to "Rebalancing." To me, "Preparing" implies that > there > > > > > will be a later "rebalance" state that follows-- but there is not. > > > > > Since we're now exposing this state name publicly in these metrics, > > > > > perhaps it makes sense to do this rename now. Thoughts? > > > > > > > > > > best, > > > > > Colin > > > > > > > > > > > > > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote: > > > > > > That's a good point. I revised the KIP to add metrics for all > the > > > > group > > > > > > states. > > > > > > > > > > > > best, > > > > > > Colin > > > > > > > > > > > > > > > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote: > > > > > > > Ah, that's right Jason. > > > > > > > > > > > > > > With that I can be convinced to add one metric per each state. > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson < > > > > ja...@confluent.io> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > "Dead" and "Empty" states are transient: groups usually > only > > > > > leaves in > > > > > > > > this > > > > > > > > > state for a short while and then being deleted or > transited to > > > > > other > > > > > > > > > states. > > > > > > > > > > > > > > > > > > > > > > > > This is not strictly true for the "Empty" state which we > also use > > > > to > > > > > > > > represent simple groups which only use the coordinator to > store > > > > > offsets. I > > > > > > > > think we may as well cover all the states if we're going to > cover > > > > > any of > > > > > > > > them specifically. > > > > > > > > > > > > > > > > -Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang < > > > wangg...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > My two cents: > > > > > > > > > > > > > > > > > > "Dead" and "Empty" states are transient: groups usually > only > > > > > leaves in > > > > > > > > this > > > > > > > > > state for a short while and then being deleted or > transited to > > > > > other > > > > > > > > > states. > > > > > > > > > > > > > > > > > > Since we have the existing "*NumGroups*" metric, > `*NumGroups - > > > > > > > > > **NumGroupsRebalancing > > > > > > > > > - **NumGroupsAwaitingSync`* should cover the above three, > where > > > > > "Stable" > > > > > > > > > should be contributing most of the counts: If we have a bug > > > that > > > > > causes > > > > > > > > the > > > > > > > > > num.Dead / Empty to keep increasing, then we would observe > > > > > `NumGroups` > > > > > > > > keep > > > > > > > > > increasing which should be sufficient for alerting. And > trouble > > > > > shooting > > > > > > > > of > > > > > > > > > the issue could be relying on the log4j. > > > > > > > > > > > > > > > > > > *Guozhang* > > > > > > > > > > > > > > > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma < > > > ism...@juma.me.uk> > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for the KIP, Colin. This will definitely be > useful. > > > One > > > > > > > > question: > > > > > > > > > > would it be useful to have a metric for for the number of > > > > groups > > > > > in > > > > > > > > each > > > > > > > > > > possible state? The KIP suggests "PreparingRebalance" and > > > > > > > > "AwaitingSync". > > > > > > > > > > That leaves "Stable", "Dead" and "Empty". Are those not > > > useful? > > > > > > > > > > > > > > > > > > > > Ismael > > > > > > > > > > > > > > > > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe < > > > > > cmcc...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > I posted "KIP-180: Add a broker metric specifying the > > > number > > > > of > > > > > > > > > consumer > > > > > > > > > > > group rebalances in progress" for discussion: > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > > > > > 180%3A+Add+a+broker+metric+specifying+the+number+of+ > > > > > > > > > > > consumer+group+rebalances+in+progress > > > > > > > > > > > > > > > > > > > > > > Check it out. > > > > > > > > > > > > > > > > > > > > > > cheers, > > > > > > > > > > > Colin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > > > > > > > -- > > -- Guozhang > -- -- Guozhang