>From an ops person's view who are mostly likely watching the metrics these names may not be very clear as people may not know the internals well. I'd prefer PrepareRebalance and CompleteRebalance since they may be easier to understand thought not 100 percent accurately match to internal implementation.
Guozhang On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson <ja...@confluent.io> wrote: > Hey Colin, Guozhang, > > I agree the current state names are not ideal for end users. I tend to see > the rebalance as joining the group and receiving the assignment. Maybe the > states could be named in those terms? For example: RebalanceJoin and > RebalanceAssignment. What do you think? > > -Jason > > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > I feel we can change `AwaitSync` to `completeRebalance` while keeping the > > other as is. > > > > cc Jason? > > > > Guozhang > > > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe <cmcc...@apache.org> > wrote: > > > >> Thanks for the explanation. I guess maybe we should just keep the group > >> names as they are, then? > >> > >> best, > >> Colin > >> > >> > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote: > >> > To me `PreparingRebalance` sounds better than `StartingRebalance` > since > >> > only by the end of that stage we have formed a new group. More > >> > specifically, this this the workflow from the coordinator's point of > >> > view: > >> > > >> > 1. decided to trigger a rebalance, enter PreparingRebalance phase; > >> > | > >> > | send out error code for all heartbeat reponses > >> > \|/ > >> > | > >> > | waiting for join group requests from members > >> > \|/ > >> > 2. formed a new group, increment the generation number, now start > >> > rebalancing, entering AwaitSync phase: > >> > | > >> > | send out the join group responses for whoever > >> > requested join > >> > \|/ > >> > | > >> > | waiting for the sync group request from the > leader > >> > \|/ > >> > 3. received assignment from the leader; the rebalance has ended, start > >> > ticking for all members, entering Stable phase. > >> > | > >> > | for whoever else sending the sync group request, > >> > reply with the assignment > >> > \|/ > >> > > >> > So from the coordinator's point of view the rebalance starts at > >> beginning > >> > of step 2 and ends at beginning of step 3. Maybe we can rename > >> > `AwaitSync` > >> > itself to `CompletingRebalance`. > >> > > >> > Guozhang > >> > > >> > > >> > > >> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma <ism...@juma.me.uk> > wrote: > >> > > >> > > Hi Guozhang, > >> > > > >> > > Thanks for the clarification. The naming does seem a bit unclear. > >> Maybe > >> > > `PreparingRebalance` could be `StartingRebalance` or something that > >> makes > >> > > it clear that it is part of the rebalance instead of a step before > the > >> > > actual rebalance. `AwaitingSync` could also be > `CompletingRebalance`, > >> but > >> > > not sure if that's better. > >> > > > >> > > Ismael > >> > > > >> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <wangg...@gmail.com> > >> wrote: > >> > > > >> > > > Actually Rebalancing includes two steps, and we name them > >> > > PrepareRebalance > >> > > > and WaitSync (arguably they may not be the best names). But these > >> two > >> > > steps > >> > > > together should be treated as the complete rebalance cycle. > >> > > > > >> > > > > >> > > > Guozhang > >> > > > > >> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe < > cmcc...@apache.org> > >> > > wrote: > >> > > > > >> > > > > Hi all, > >> > > > > > >> > > > > I think maybe it makes sense to rename the "PreparingRebalance" > >> > > consumer > >> > > > > group state to "Rebalancing." To me, "Preparing" implies that > >> there > >> > > > > will be a later "rebalance" state that follows-- but there is > not. > >> > > > > Since we're now exposing this state name publicly in these > >> metrics, > >> > > > > perhaps it makes sense to do this rename now. Thoughts? > >> > > > > > >> > > > > best, > >> > > > > Colin > >> > > > > > >> > > > > > >> > > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote: > >> > > > > > That's a good point. I revised the KIP to add metrics for all > >> the > >> > > > group > >> > > > > > states. > >> > > > > > > >> > > > > > best, > >> > > > > > Colin > >> > > > > > > >> > > > > > > >> > > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote: > >> > > > > > > Ah, that's right Jason. > >> > > > > > > > >> > > > > > > With that I can be convinced to add one metric per each > state. > >> > > > > > > > >> > > > > > > Guozhang > >> > > > > > > > >> > > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson < > >> > > > ja...@confluent.io> > >> > > > > > > wrote: > >> > > > > > > > >> > > > > > > > > > >> > > > > > > > > "Dead" and "Empty" states are transient: groups usually > >> only > >> > > > > leaves in > >> > > > > > > > this > >> > > > > > > > > state for a short while and then being deleted or > >> transited to > >> > > > > other > >> > > > > > > > > states. > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > This is not strictly true for the "Empty" state which we > >> also use > >> > > > to > >> > > > > > > > represent simple groups which only use the coordinator to > >> store > >> > > > > offsets. I > >> > > > > > > > think we may as well cover all the states if we're going > to > >> cover > >> > > > > any of > >> > > > > > > > them specifically. > >> > > > > > > > > >> > > > > > > > -Jason > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang < > >> > > wangg...@gmail.com > >> > > > > > >> > > > > wrote: > >> > > > > > > > > >> > > > > > > > > My two cents: > >> > > > > > > > > > >> > > > > > > > > "Dead" and "Empty" states are transient: groups usually > >> only > >> > > > > leaves in > >> > > > > > > > this > >> > > > > > > > > state for a short while and then being deleted or > >> transited to > >> > > > > other > >> > > > > > > > > states. > >> > > > > > > > > > >> > > > > > > > > Since we have the existing "*NumGroups*" metric, > >> `*NumGroups - > >> > > > > > > > > **NumGroupsRebalancing > >> > > > > > > > > - **NumGroupsAwaitingSync`* should cover the above > three, > >> where > >> > > > > "Stable" > >> > > > > > > > > should be contributing most of the counts: If we have a > >> bug > >> > > that > >> > > > > causes > >> > > > > > > > the > >> > > > > > > > > num.Dead / Empty to keep increasing, then we would > observe > >> > > > > `NumGroups` > >> > > > > > > > keep > >> > > > > > > > > increasing which should be sufficient for alerting. And > >> trouble > >> > > > > shooting > >> > > > > > > > of > >> > > > > > > > > the issue could be relying on the log4j. > >> > > > > > > > > > >> > > > > > > > > *Guozhang* > >> > > > > > > > > > >> > > > > > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma < > >> > > ism...@juma.me.uk> > >> > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > > Thanks for the KIP, Colin. This will definitely be > >> useful. > >> > > One > >> > > > > > > > question: > >> > > > > > > > > > would it be useful to have a metric for for the number > >> of > >> > > > groups > >> > > > > in > >> > > > > > > > each > >> > > > > > > > > > possible state? The KIP suggests "PreparingRebalance" > >> and > >> > > > > > > > "AwaitingSync". > >> > > > > > > > > > That leaves "Stable", "Dead" and "Empty". Are those > not > >> > > useful? > >> > > > > > > > > > > >> > > > > > > > > > Ismael > >> > > > > > > > > > > >> > > > > > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe < > >> > > > > cmcc...@apache.org> > >> > > > > > > > > wrote: > >> > > > > > > > > > > >> > > > > > > > > > > Hi all, > >> > > > > > > > > > > > >> > > > > > > > > > > I posted "KIP-180: Add a broker metric specifying > the > >> > > number > >> > > > of > >> > > > > > > > > consumer > >> > > > > > > > > > > group rebalances in progress" for discussion: > >> > > > > > > > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confl > >> uence/display/KAFKA/KIP- > >> > > > > > > > > > > 180%3A+Add+a+broker+metric+ > specifying+the+number+of+ > >> > > > > > > > > > > consumer+group+rebalances+in+progress > >> > > > > > > > > > > > >> > > > > > > > > > > Check it out. > >> > > > > > > > > > > > >> > > > > > > > > > > cheers, > >> > > > > > > > > > > Colin > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > -- > >> > > > > > > > > -- Guozhang > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > -- > >> > > > > > > -- Guozhang > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > -- Guozhang > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > -- Guozhang > >> > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang