Hey Colin, Guozhang,

I agree the current state names are not ideal for end users. I tend to see
the rebalance as joining the group and receiving the assignment. Maybe the
states could be named in those terms? For example: RebalanceJoin and
RebalanceAssignment. What do you think?

-Jason

On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> I feel we can change `AwaitSync` to `completeRebalance` while keeping the
> other as is.
>
> cc Jason?
>
> Guozhang
>
> On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe <cmcc...@apache.org> wrote:
>
>> Thanks for the explanation.  I guess maybe we should just keep the group
>> names as they are, then?
>>
>> best,
>> Colin
>>
>>
>> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
>> > To me `PreparingRebalance` sounds better than `StartingRebalance` since
>> > only by the end of that stage we have formed a new group. More
>> > specifically, this this the workflow from the coordinator's point of
>> > view:
>> >
>> > 1. decided to trigger a rebalance, enter PreparingRebalance phase;
>> >                   |
>> >                   |   send out error code for all heartbeat reponses
>> >                  \|/
>> >                   |
>> >                   |   waiting for join group requests from members
>> >                  \|/
>> > 2. formed a new group, increment the generation number, now start
>> > rebalancing, entering AwaitSync phase:
>> >                   |
>> >                   |   send out the join group responses for whoever
>> > requested join
>> >                  \|/
>> >                   |
>> >                   |   waiting for the sync group request from the leader
>> >                  \|/
>> > 3. received assignment from the leader; the rebalance has ended, start
>> > ticking for all members, entering Stable phase.
>> >                   |
>> >                   |   for whoever else sending the sync group request,
>> > reply with the assignment
>> >                  \|/
>> >
>> > So from the coordinator's point of view the rebalance starts at
>> beginning
>> > of step 2 and ends at beginning of step 3. Maybe we can rename
>> > `AwaitSync`
>> > itself to `CompletingRebalance`.
>> >
>> > Guozhang
>> >
>> >
>> >
>> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>> >
>> > > Hi Guozhang,
>> > >
>> > > Thanks for the clarification. The naming does seem a bit unclear.
>> Maybe
>> > > `PreparingRebalance` could be `StartingRebalance` or something that
>> makes
>> > > it clear that it is part of the rebalance instead of a step before the
>> > > actual rebalance. `AwaitingSync` could also be `CompletingRebalance`,
>> but
>> > > not sure if that's better.
>> > >
>> > > Ismael
>> > >
>> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <wangg...@gmail.com>
>> wrote:
>> > >
>> > > > Actually Rebalancing includes two steps, and we name them
>> > > PrepareRebalance
>> > > > and WaitSync (arguably they may not be the best names). But these
>> two
>> > > steps
>> > > > together should be treated as the complete rebalance cycle.
>> > > >
>> > > >
>> > > > Guozhang
>> > > >
>> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe <cmcc...@apache.org>
>> > > wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > I think maybe it makes sense to rename the "PreparingRebalance"
>> > > consumer
>> > > > > group state to "Rebalancing."  To me, "Preparing" implies that
>> there
>> > > > > will be a later "rebalance" state that follows-- but there is not.
>> > > > > Since we're now exposing this state name publicly in these
>> metrics,
>> > > > > perhaps it makes sense to do this rename now.  Thoughts?
>> > > > >
>> > > > > best,
>> > > > > Colin
>> > > > >
>> > > > >
>> > > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
>> > > > > > That's a good point.  I revised the KIP to add metrics for all
>> the
>> > > > group
>> > > > > > states.
>> > > > > >
>> > > > > > best,
>> > > > > > Colin
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
>> > > > > > > Ah, that's right Jason.
>> > > > > > >
>> > > > > > > With that I can be convinced to add one metric per each state.
>> > > > > > >
>> > > > > > > Guozhang
>> > > > > > >
>> > > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
>> > > > ja...@confluent.io>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > >
>> > > > > > > > > "Dead" and "Empty" states are transient: groups usually
>> only
>> > > > > leaves in
>> > > > > > > > this
>> > > > > > > > > state for a short while and then being deleted or
>> transited to
>> > > > > other
>> > > > > > > > > states.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > This is not strictly true for the "Empty" state which we
>> also use
>> > > > to
>> > > > > > > > represent simple groups which only use the coordinator to
>> store
>> > > > > offsets. I
>> > > > > > > > think we may as well cover all the states if we're going to
>> cover
>> > > > > any of
>> > > > > > > > them specifically.
>> > > > > > > >
>> > > > > > > > -Jason
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang <
>> > > wangg...@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > > > My two cents:
>> > > > > > > > >
>> > > > > > > > > "Dead" and "Empty" states are transient: groups usually
>> only
>> > > > > leaves in
>> > > > > > > > this
>> > > > > > > > > state for a short while and then being deleted or
>> transited to
>> > > > > other
>> > > > > > > > > states.
>> > > > > > > > >
>> > > > > > > > > Since we have the existing "*NumGroups*" metric,
>> `*NumGroups -
>> > > > > > > > > **NumGroupsRebalancing
>> > > > > > > > > - **NumGroupsAwaitingSync`* should cover the above three,
>> where
>> > > > > "Stable"
>> > > > > > > > > should be contributing most of the counts: If we have a
>> bug
>> > > that
>> > > > > causes
>> > > > > > > > the
>> > > > > > > > > num.Dead / Empty to keep increasing, then we would observe
>> > > > > `NumGroups`
>> > > > > > > > keep
>> > > > > > > > > increasing which should be sufficient for alerting. And
>> trouble
>> > > > > shooting
>> > > > > > > > of
>> > > > > > > > > the issue could be relying on the log4j.
>> > > > > > > > >
>> > > > > > > > > *Guozhang*
>> > > > > > > > >
>> > > > > > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma <
>> > > ism...@juma.me.uk>
>> > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Thanks for the KIP, Colin. This will definitely be
>> useful.
>> > > One
>> > > > > > > > question:
>> > > > > > > > > > would it be useful to have a metric for for the number
>> of
>> > > > groups
>> > > > > in
>> > > > > > > > each
>> > > > > > > > > > possible state? The KIP suggests "PreparingRebalance"
>> and
>> > > > > > > > "AwaitingSync".
>> > > > > > > > > > That leaves "Stable", "Dead" and "Empty". Are those not
>> > > useful?
>> > > > > > > > > >
>> > > > > > > > > > Ismael
>> > > > > > > > > >
>> > > > > > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe <
>> > > > > cmcc...@apache.org>
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi all,
>> > > > > > > > > > >
>> > > > > > > > > > > I posted "KIP-180: Add a broker metric specifying the
>> > > number
>> > > > of
>> > > > > > > > > consumer
>> > > > > > > > > > > group rebalances in progress" for discussion:
>> > > > > > > > > > >
>> > > > > > > > > > > https://cwiki.apache.org/confl
>> uence/display/KAFKA/KIP-
>> > > > > > > > > > > 180%3A+Add+a+broker+metric+specifying+the+number+of+
>> > > > > > > > > > > consumer+group+rebalances+in+progress
>> > > > > > > > > > >
>> > > > > > > > > > > Check it out.
>> > > > > > > > > > >
>> > > > > > > > > > > cheers,
>> > > > > > > > > > > Colin
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > -- Guozhang
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > -- Guozhang
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > -- Guozhang
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > -- Guozhang
>>
>
>
>
> --
> -- Guozhang
>

Reply via email to