To me `PreparingRebalance` sounds better than `StartingRebalance` since
only by the end of that stage we have formed a new group. More
specifically, this this the workflow from the coordinator's point of view:

1. decided to trigger a rebalance, enter PreparingRebalance phase;
                  |
                  |   send out error code for all heartbeat reponses
                 \|/
                  |
                  |   waiting for join group requests from members
                 \|/
2. formed a new group, increment the generation number, now start
rebalancing, entering AwaitSync phase:
                  |
                  |   send out the join group responses for whoever
requested join
                 \|/
                  |
                  |   waiting for the sync group request from the leader
                 \|/
3. received assignment from the leader; the rebalance has ended, start
ticking for all members, entering Stable phase.
                  |
                  |   for whoever else sending the sync group request,
reply with the assignment
                 \|/

So from the coordinator's point of view the rebalance starts at beginning
of step 2 and ends at beginning of step 3. Maybe we can rename `AwaitSync`
itself to `CompletingRebalance`.

Guozhang



On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Guozhang,
>
> Thanks for the clarification. The naming does seem a bit unclear. Maybe
> `PreparingRebalance` could be `StartingRebalance` or something that makes
> it clear that it is part of the rebalance instead of a step before the
> actual rebalance. `AwaitingSync` could also be `CompletingRebalance`, but
> not sure if that's better.
>
> Ismael
>
> On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Actually Rebalancing includes two steps, and we name them
> PrepareRebalance
> > and WaitSync (arguably they may not be the best names). But these two
> steps
> > together should be treated as the complete rebalance cycle.
> >
> >
> > Guozhang
> >
> > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >
> > > Hi all,
> > >
> > > I think maybe it makes sense to rename the "PreparingRebalance"
> consumer
> > > group state to "Rebalancing."  To me, "Preparing" implies that there
> > > will be a later "rebalance" state that follows-- but there is not.
> > > Since we're now exposing this state name publicly in these metrics,
> > > perhaps it makes sense to do this rename now.  Thoughts?
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > > > That's a good point.  I revised the KIP to add metrics for all the
> > group
> > > > states.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > > > > Ah, that's right Jason.
> > > > >
> > > > > With that I can be convinced to add one metric per each state.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
> > ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > >
> > > > > > > "Dead" and "Empty" states are transient: groups usually only
> > > leaves in
> > > > > > this
> > > > > > > state for a short while and then being deleted or transited to
> > > other
> > > > > > > states.
> > > > > >
> > > > > >
> > > > > > This is not strictly true for the "Empty" state which we also use
> > to
> > > > > > represent simple groups which only use the coordinator to store
> > > offsets. I
> > > > > > think we may as well cover all the states if we're going to cover
> > > any of
> > > > > > them specifically.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang <
> wangg...@gmail.com
> > >
> > > wrote:
> > > > > >
> > > > > > > My two cents:
> > > > > > >
> > > > > > > "Dead" and "Empty" states are transient: groups usually only
> > > leaves in
> > > > > > this
> > > > > > > state for a short while and then being deleted or transited to
> > > other
> > > > > > > states.
> > > > > > >
> > > > > > > Since we have the existing "*NumGroups*" metric, `*NumGroups -
> > > > > > > **NumGroupsRebalancing
> > > > > > > - **NumGroupsAwaitingSync`* should cover the above three, where
> > > "Stable"
> > > > > > > should be contributing most of the counts: If we have a bug
> that
> > > causes
> > > > > > the
> > > > > > > num.Dead / Empty to keep increasing, then we would observe
> > > `NumGroups`
> > > > > > keep
> > > > > > > increasing which should be sufficient for alerting. And trouble
> > > shooting
> > > > > > of
> > > > > > > the issue could be relying on the log4j.
> > > > > > >
> > > > > > > *Guozhang*
> > > > > > >
> > > > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma <
> ism...@juma.me.uk>
> > > wrote:
> > > > > > >
> > > > > > > > Thanks for the KIP, Colin. This will definitely be useful.
> One
> > > > > > question:
> > > > > > > > would it be useful to have a metric for for the number of
> > groups
> > > in
> > > > > > each
> > > > > > > > possible state? The KIP suggests "PreparingRebalance" and
> > > > > > "AwaitingSync".
> > > > > > > > That leaves "Stable", "Dead" and "Empty". Are those not
> useful?
> > > > > > > >
> > > > > > > > Ismael
> > > > > > > >
> > > > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe <
> > > cmcc...@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I posted "KIP-180: Add a broker metric specifying the
> number
> > of
> > > > > > > consumer
> > > > > > > > > group rebalances in progress" for discussion:
> > > > > > > > >
> > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > 180%3A+Add+a+broker+metric+specifying+the+number+of+
> > > > > > > > > consumer+group+rebalances+in+progress
> > > > > > > > >
> > > > > > > > > Check it out.
> > > > > > > > >
> > > > > > > > > cheers,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Reply via email to