Hi Apurva,

Thanks for taking another look.  Responses below:

On Mon, Aug 7, 2017, at 17:47, Apurva Mehta wrote:
> The KIP looks good to me. In your latest proposal, the change of state
> would be captured as followed in the metrics for groups using Kafka for
> membership management:
> 
> PreparingRebalance -> CompletingRebalance -> Stable -> Dead?

Right.  As described in
core/src/main/scala/kafka/coordinator/group/GroupMetadata.scala

> 
> If a group is just being used to store offsets, then it is always Empty?

I believe so.  Groups can also be Empty if they have no more members,
but the offsets have not yet expired.

Of course, this KIP is just exposing the metrics, not changing how
groups work in any way.

Maybe we should start a vote, if this looks good to everyone?

best,
Colin


> 
> If so, this makes sense to me.
> 
> Thanks,
> Apurva
> 
> On Mon, Aug 7, 2017 at 5:09 PM, Colin McCabe <co...@cmccabe.xyz> wrote:
> 
> > How about PreparingRebalance / CompletingRebalance?
> >
> > cheers,
> > Colin
> >
> >
> > On Fri, Aug 4, 2017, at 09:03, Ismael Juma wrote:
> > > I agree that we should make them consistent. I think RebalanceJoin and
> > > RebalanceAssignment are reasonable names. I think they are a bit more
> > > descriptive than `PreparingRebalance` and `CompletingRebalance`. If we
> > > need
> > > to add more states, it seems a little easier to do if the states are a
> > > bit
> > > more descriptive. I am OK with either of the 2 options as I think they
> > > are
> > > both better than the status quo.
> > >
> > > Ismael
> > >
> > > On Fri, Aug 4, 2017 at 4:52 PM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hey Guozhang,
> > > >
> > > > Usually I think such naming inconsistencies are best avoided. It adds
> > > > another level of confusion for people who have to dip into the code,
> > figure
> > > > out a problem, and ultimately explain it. Since we already have the
> > > > PreparingRebalance state, maybe we could just rename the AwaitingSync
> > state
> > > > to CompletingRebalance?
> > > >
> > > > -Jason
> > > >
> > > > On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > > >
> > > > > From an ops person's view who are mostly likely watching the metrics
> > > > these
> > > > > names may not be very clear as people may not know the internals
> > well.
> > > > I'd
> > > > > prefer PrepareRebalance and CompleteRebalance since they may be
> > easier to
> > > > > understand thought not 100 percent accurately match to internal
> > > > > implementation.
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > >
> > > > > On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson <ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey Colin, Guozhang,
> > > > > >
> > > > > > I agree the current state names are not ideal for end users. I
> > tend to
> > > > > see
> > > > > > the rebalance as joining the group and receiving the assignment.
> > Maybe
> > > > > the
> > > > > > states could be named in those terms? For example: RebalanceJoin
> > and
> > > > > > RebalanceAssignment. What do you think?
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang <
> > wangg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I feel we can change `AwaitSync` to `completeRebalance` while
> > keeping
> > > > > the
> > > > > > > other as is.
> > > > > > >
> > > > > > > cc Jason?
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe <
> > cmcc...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks for the explanation.  I guess maybe we should just keep
> > the
> > > > > group
> > > > > > >> names as they are, then?
> > > > > > >>
> > > > > > >> best,
> > > > > > >> Colin
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > > > > > >> > To me `PreparingRebalance` sounds better than
> > `StartingRebalance`
> > > > > > since
> > > > > > >> > only by the end of that stage we have formed a new group. More
> > > > > > >> > specifically, this this the workflow from the coordinator's
> > point
> > > > of
> > > > > > >> > view:
> > > > > > >> >
> > > > > > >> > 1. decided to trigger a rebalance, enter PreparingRebalance
> > phase;
> > > > > > >> >                   |
> > > > > > >> >                   |   send out error code for all heartbeat
> > > > reponses
> > > > > > >> >                  \|/
> > > > > > >> >                   |
> > > > > > >> >                   |   waiting for join group requests from
> > members
> > > > > > >> >                  \|/
> > > > > > >> > 2. formed a new group, increment the generation number, now
> > start
> > > > > > >> > rebalancing, entering AwaitSync phase:
> > > > > > >> >                   |
> > > > > > >> >                   |   send out the join group responses for
> > > > whoever
> > > > > > >> > requested join
> > > > > > >> >                  \|/
> > > > > > >> >                   |
> > > > > > >> >                   |   waiting for the sync group request from
> > the
> > > > > > leader
> > > > > > >> >                  \|/
> > > > > > >> > 3. received assignment from the leader; the rebalance has
> > ended,
> > > > > start
> > > > > > >> > ticking for all members, entering Stable phase.
> > > > > > >> >                   |
> > > > > > >> >                   |   for whoever else sending the sync group
> > > > > request,
> > > > > > >> > reply with the assignment
> > > > > > >> >                  \|/
> > > > > > >> >
> > > > > > >> > So from the coordinator's point of view the rebalance starts
> > at
> > > > > > >> beginning
> > > > > > >> > of step 2 and ends at beginning of step 3. Maybe we can rename
> > > > > > >> > `AwaitSync`
> > > > > > >> > itself to `CompletingRebalance`.
> > > > > > >> >
> > > > > > >> > Guozhang
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma <
> > ism...@juma.me.uk>
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> > > Hi Guozhang,
> > > > > > >> > >
> > > > > > >> > > Thanks for the clarification. The naming does seem a bit
> > > > unclear.
> > > > > > >> Maybe
> > > > > > >> > > `PreparingRebalance` could be `StartingRebalance` or
> > something
> > > > > that
> > > > > > >> makes
> > > > > > >> > > it clear that it is part of the rebalance instead of a step
> > > > before
> > > > > > the
> > > > > > >> > > actual rebalance. `AwaitingSync` could also be
> > > > > > `CompletingRebalance`,
> > > > > > >> but
> > > > > > >> > > not sure if that's better.
> > > > > > >> > >
> > > > > > >> > > Ismael
> > > > > > >> > >
> > > > > > >> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <
> > > > > wangg...@gmail.com>
> > > > > > >> wrote:
> > > > > > >> > >
> > > > > > >> > > > Actually Rebalancing includes two steps, and we name them
> > > > > > >> > > PrepareRebalance
> > > > > > >> > > > and WaitSync (arguably they may not be the best names).
> > But
> > > > > these
> > > > > > >> two
> > > > > > >> > > steps
> > > > > > >> > > > together should be treated as the complete rebalance
> > cycle.
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > Guozhang
> > > > > > >> > > >
> > > > > > >> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe <
> > > > > > cmcc...@apache.org>
> > > > > > >> > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hi all,
> > > > > > >> > > > >
> > > > > > >> > > > > I think maybe it makes sense to rename the
> > > > > "PreparingRebalance"
> > > > > > >> > > consumer
> > > > > > >> > > > > group state to "Rebalancing."  To me, "Preparing"
> > implies
> > > > that
> > > > > > >> there
> > > > > > >> > > > > will be a later "rebalance" state that follows-- but
> > there
> > > > is
> > > > > > not.
> > > > > > >> > > > > Since we're now exposing this state name publicly in
> > these
> > > > > > >> metrics,
> > > > > > >> > > > > perhaps it makes sense to do this rename now.  Thoughts?
> > > > > > >> > > > >
> > > > > > >> > > > > best,
> > > > > > >> > > > > Colin
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > > > > > >> > > > > > That's a good point.  I revised the KIP to add
> > metrics for
> > > > > all
> > > > > > >> the
> > > > > > >> > > > group
> > > > > > >> > > > > > states.
> > > > > > >> > > > > >
> > > > > > >> > > > > > best,
> > > > > > >> > > > > > Colin
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > > > > > >> > > > > > > Ah, that's right Jason.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > With that I can be convinced to add one metric per
> > each
> > > > > > state.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Guozhang
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
> > > > > > >> > > > ja...@confluent.io>
> > > > > > >> > > > > > > wrote:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > "Dead" and "Empty" states are transient: groups
> > > > > usually
> > > > > > >> only
> > > > > > >> > > > > leaves in
> > > > > > >> > > > > > > > this
> > > > > > >> > > > > > > > > state for a short while and then being deleted
> > or
> > > > > > >> transited to
> > > > > > >> > > > > other
> > > > > > >> > > > > > > > > states.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > This is not strictly true for the "Empty" state
> > which
> > > > we
> > > > > > >> also use
> > > > > > >> > > > to
> > > > > > >> > > > > > > > represent simple groups which only use the
> > coordinator
> > > > > to
> > > > > > >> store
> > > > > > >> > > > > offsets. I
> > > > > > >> > > > > > > > think we may as well cover all the states if we're
> > > > going
> > > > > > to
> > > > > > >> cover
> > > > > > >> > > > > any of
> > > > > > >> > > > > > > > them specifically.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > -Jason
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang <
> > > > > > >> > > wangg...@gmail.com
> > > > > > >> > > > >
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > My two cents:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > "Dead" and "Empty" states are transient: groups
> > > > > usually
> > > > > > >> only
> > > > > > >> > > > > leaves in
> > > > > > >> > > > > > > > this
> > > > > > >> > > > > > > > > state for a short while and then being deleted
> > or
> > > > > > >> transited to
> > > > > > >> > > > > other
> > > > > > >> > > > > > > > > states.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Since we have the existing "*NumGroups*" metric,
> > > > > > >> `*NumGroups -
> > > > > > >> > > > > > > > > **NumGroupsRebalancing
> > > > > > >> > > > > > > > > - **NumGroupsAwaitingSync`* should cover the
> > above
> > > > > > three,
> > > > > > >> where
> > > > > > >> > > > > "Stable"
> > > > > > >> > > > > > > > > should be contributing most of the counts: If we
> > > > have
> > > > > a
> > > > > > >> bug
> > > > > > >> > > that
> > > > > > >> > > > > causes
> > > > > > >> > > > > > > > the
> > > > > > >> > > > > > > > > num.Dead / Empty to keep increasing, then we
> > would
> > > > > > observe
> > > > > > >> > > > > `NumGroups`
> > > > > > >> > > > > > > > keep
> > > > > > >> > > > > > > > > increasing which should be sufficient for
> > alerting.
> > > > > And
> > > > > > >> trouble
> > > > > > >> > > > > shooting
> > > > > > >> > > > > > > > of
> > > > > > >> > > > > > > > > the issue could be relying on the log4j.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > *Guozhang*
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma <
> > > > > > >> > > ism...@juma.me.uk>
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > Thanks for the KIP, Colin. This will
> > definitely be
> > > > > > >> useful.
> > > > > > >> > > One
> > > > > > >> > > > > > > > question:
> > > > > > >> > > > > > > > > > would it be useful to have a metric for for
> > the
> > > > > number
> > > > > > >> of
> > > > > > >> > > > groups
> > > > > > >> > > > > in
> > > > > > >> > > > > > > > each
> > > > > > >> > > > > > > > > > possible state? The KIP suggests
> > > > > "PreparingRebalance"
> > > > > > >> and
> > > > > > >> > > > > > > > "AwaitingSync".
> > > > > > >> > > > > > > > > > That leaves "Stable", "Dead" and "Empty". Are
> > > > those
> > > > > > not
> > > > > > >> > > useful?
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Ismael
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe
> > <
> > > > > > >> > > > > cmcc...@apache.org>
> > > > > > >> > > > > > > > > wrote:
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > > Hi all,
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > I posted "KIP-180: Add a broker metric
> > > > specifying
> > > > > > the
> > > > > > >> > > number
> > > > > > >> > > > of
> > > > > > >> > > > > > > > > consumer
> > > > > > >> > > > > > > > > > > group rebalances in progress" for
> > discussion:
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confl
> > > > > > >> uence/display/KAFKA/KIP-
> > > > > > >> > > > > > > > > > > 180%3A+Add+a+broker+metric+
> > > > > > specifying+the+number+of+
> > > > > > >> > > > > > > > > > > consumer+group+rebalances+in+progress
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Check it out.
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > cheers,
> > > > > > >> > > > > > > > > > > Colin
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > --
> > > > > > >> > > > > > > > > -- Guozhang
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > --
> > > > > > >> > > > > > > -- Guozhang
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > --
> > > > > > >> > > > -- Guozhang
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > --
> > > > > > >> > -- Guozhang
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> >

Reply via email to