Sounds good to me.

On Wed, Aug 9, 2017 at 11:32 PM, Colin McCabe <cmcc...@apache.org> wrote:

> Hi Apurva,
>
> Thanks for taking another look.  Responses below:
>
> On Mon, Aug 7, 2017, at 17:47, Apurva Mehta wrote:
> > The KIP looks good to me. In your latest proposal, the change of state
> > would be captured as followed in the metrics for groups using Kafka for
> > membership management:
> >
> > PreparingRebalance -> CompletingRebalance -> Stable -> Dead?
>
> Right.  As described in
> core/src/main/scala/kafka/coordinator/group/GroupMetadata.scala
>
> >
> > If a group is just being used to store offsets, then it is always Empty?
>
> I believe so.  Groups can also be Empty if they have no more members,
> but the offsets have not yet expired.
>
> Of course, this KIP is just exposing the metrics, not changing how
> groups work in any way.
>
> Maybe we should start a vote, if this looks good to everyone?
>
> best,
> Colin
>
>
> >
> > If so, this makes sense to me.
> >
> > Thanks,
> > Apurva
> >
> > On Mon, Aug 7, 2017 at 5:09 PM, Colin McCabe <co...@cmccabe.xyz> wrote:
> >
> > > How about PreparingRebalance / CompletingRebalance?
> > >
> > > cheers,
> > > Colin
> > >
> > >
> > > On Fri, Aug 4, 2017, at 09:03, Ismael Juma wrote:
> > > > I agree that we should make them consistent. I think RebalanceJoin
> and
> > > > RebalanceAssignment are reasonable names. I think they are a bit more
> > > > descriptive than `PreparingRebalance` and `CompletingRebalance`. If
> we
> > > > need
> > > > to add more states, it seems a little easier to do if the states are
> a
> > > > bit
> > > > more descriptive. I am OK with either of the 2 options as I think
> they
> > > > are
> > > > both better than the status quo.
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Aug 4, 2017 at 4:52 PM, Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Guozhang,
> > > > >
> > > > > Usually I think such naming inconsistencies are best avoided. It
> adds
> > > > > another level of confusion for people who have to dip into the
> code,
> > > figure
> > > > > out a problem, and ultimately explain it. Since we already have the
> > > > > PreparingRebalance state, maybe we could just rename the
> AwaitingSync
> > > state
> > > > > to CompletingRebalance?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang <wangg...@gmail.com>
> > > wrote:
> > > > >
> > > > > > From an ops person's view who are mostly likely watching the
> metrics
> > > > > these
> > > > > > names may not be very clear as people may not know the internals
> > > well.
> > > > > I'd
> > > > > > prefer PrepareRebalance and CompleteRebalance since they may be
> > > easier to
> > > > > > understand thought not 100 percent accurately match to internal
> > > > > > implementation.
> > > > > >
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson <
> ja...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Colin, Guozhang,
> > > > > > >
> > > > > > > I agree the current state names are not ideal for end users. I
> > > tend to
> > > > > > see
> > > > > > > the rebalance as joining the group and receiving the
> assignment.
> > > Maybe
> > > > > > the
> > > > > > > states could be named in those terms? For example:
> RebalanceJoin
> > > and
> > > > > > > RebalanceAssignment. What do you think?
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang <
> > > wangg...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I feel we can change `AwaitSync` to `completeRebalance` while
> > > keeping
> > > > > > the
> > > > > > > > other as is.
> > > > > > > >
> > > > > > > > cc Jason?
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe <
> > > cmcc...@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks for the explanation.  I guess maybe we should just
> keep
> > > the
> > > > > > group
> > > > > > > >> names as they are, then?
> > > > > > > >>
> > > > > > > >> best,
> > > > > > > >> Colin
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > > > > > > >> > To me `PreparingRebalance` sounds better than
> > > `StartingRebalance`
> > > > > > > since
> > > > > > > >> > only by the end of that stage we have formed a new group.
> More
> > > > > > > >> > specifically, this this the workflow from the
> coordinator's
> > > point
> > > > > of
> > > > > > > >> > view:
> > > > > > > >> >
> > > > > > > >> > 1. decided to trigger a rebalance, enter
> PreparingRebalance
> > > phase;
> > > > > > > >> >                   |
> > > > > > > >> >                   |   send out error code for all
> heartbeat
> > > > > reponses
> > > > > > > >> >                  \|/
> > > > > > > >> >                   |
> > > > > > > >> >                   |   waiting for join group requests from
> > > members
> > > > > > > >> >                  \|/
> > > > > > > >> > 2. formed a new group, increment the generation number,
> now
> > > start
> > > > > > > >> > rebalancing, entering AwaitSync phase:
> > > > > > > >> >                   |
> > > > > > > >> >                   |   send out the join group responses
> for
> > > > > whoever
> > > > > > > >> > requested join
> > > > > > > >> >                  \|/
> > > > > > > >> >                   |
> > > > > > > >> >                   |   waiting for the sync group request
> from
> > > the
> > > > > > > leader
> > > > > > > >> >                  \|/
> > > > > > > >> > 3. received assignment from the leader; the rebalance has
> > > ended,
> > > > > > start
> > > > > > > >> > ticking for all members, entering Stable phase.
> > > > > > > >> >                   |
> > > > > > > >> >                   |   for whoever else sending the sync
> group
> > > > > > request,
> > > > > > > >> > reply with the assignment
> > > > > > > >> >                  \|/
> > > > > > > >> >
> > > > > > > >> > So from the coordinator's point of view the rebalance
> starts
> > > at
> > > > > > > >> beginning
> > > > > > > >> > of step 2 and ends at beginning of step 3. Maybe we can
> rename
> > > > > > > >> > `AwaitSync`
> > > > > > > >> > itself to `CompletingRebalance`.
> > > > > > > >> >
> > > > > > > >> > Guozhang
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma <
> > > ism...@juma.me.uk>
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hi Guozhang,
> > > > > > > >> > >
> > > > > > > >> > > Thanks for the clarification. The naming does seem a bit
> > > > > unclear.
> > > > > > > >> Maybe
> > > > > > > >> > > `PreparingRebalance` could be `StartingRebalance` or
> > > something
> > > > > > that
> > > > > > > >> makes
> > > > > > > >> > > it clear that it is part of the rebalance instead of a
> step
> > > > > before
> > > > > > > the
> > > > > > > >> > > actual rebalance. `AwaitingSync` could also be
> > > > > > > `CompletingRebalance`,
> > > > > > > >> but
> > > > > > > >> > > not sure if that's better.
> > > > > > > >> > >
> > > > > > > >> > > Ismael
> > > > > > > >> > >
> > > > > > > >> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <
> > > > > > wangg...@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Actually Rebalancing includes two steps, and we name
> them
> > > > > > > >> > > PrepareRebalance
> > > > > > > >> > > > and WaitSync (arguably they may not be the best
> names).
> > > But
> > > > > > these
> > > > > > > >> two
> > > > > > > >> > > steps
> > > > > > > >> > > > together should be treated as the complete rebalance
> > > cycle.
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > Guozhang
> > > > > > > >> > > >
> > > > > > > >> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe <
> > > > > > > cmcc...@apache.org>
> > > > > > > >> > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Hi all,
> > > > > > > >> > > > >
> > > > > > > >> > > > > I think maybe it makes sense to rename the
> > > > > > "PreparingRebalance"
> > > > > > > >> > > consumer
> > > > > > > >> > > > > group state to "Rebalancing."  To me, "Preparing"
> > > implies
> > > > > that
> > > > > > > >> there
> > > > > > > >> > > > > will be a later "rebalance" state that follows-- but
> > > there
> > > > > is
> > > > > > > not.
> > > > > > > >> > > > > Since we're now exposing this state name publicly in
> > > these
> > > > > > > >> metrics,
> > > > > > > >> > > > > perhaps it makes sense to do this rename now.
> Thoughts?
> > > > > > > >> > > > >
> > > > > > > >> > > > > best,
> > > > > > > >> > > > > Colin
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > > > > > > >> > > > > > That's a good point.  I revised the KIP to add
> > > metrics for
> > > > > > all
> > > > > > > >> the
> > > > > > > >> > > > group
> > > > > > > >> > > > > > states.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > best,
> > > > > > > >> > > > > > Colin
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang
> wrote:
> > > > > > > >> > > > > > > Ah, that's right Jason.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > With that I can be convinced to add one metric
> per
> > > each
> > > > > > > state.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Guozhang
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason
> Gustafson <
> > > > > > > >> > > > ja...@confluent.io>
> > > > > > > >> > > > > > > wrote:
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > "Dead" and "Empty" states are transient:
> groups
> > > > > > usually
> > > > > > > >> only
> > > > > > > >> > > > > leaves in
> > > > > > > >> > > > > > > > this
> > > > > > > >> > > > > > > > > state for a short while and then being
> deleted
> > > or
> > > > > > > >> transited to
> > > > > > > >> > > > > other
> > > > > > > >> > > > > > > > > states.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > This is not strictly true for the "Empty"
> state
> > > which
> > > > > we
> > > > > > > >> also use
> > > > > > > >> > > > to
> > > > > > > >> > > > > > > > represent simple groups which only use the
> > > coordinator
> > > > > > to
> > > > > > > >> store
> > > > > > > >> > > > > offsets. I
> > > > > > > >> > > > > > > > think we may as well cover all the states if
> we're
> > > > > going
> > > > > > > to
> > > > > > > >> cover
> > > > > > > >> > > > > any of
> > > > > > > >> > > > > > > > them specifically.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > -Jason
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang
> Wang <
> > > > > > > >> > > wangg...@gmail.com
> > > > > > > >> > > > >
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > > My two cents:
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > "Dead" and "Empty" states are transient:
> groups
> > > > > > usually
> > > > > > > >> only
> > > > > > > >> > > > > leaves in
> > > > > > > >> > > > > > > > this
> > > > > > > >> > > > > > > > > state for a short while and then being
> deleted
> > > or
> > > > > > > >> transited to
> > > > > > > >> > > > > other
> > > > > > > >> > > > > > > > > states.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Since we have the existing "*NumGroups*"
> metric,
> > > > > > > >> `*NumGroups -
> > > > > > > >> > > > > > > > > **NumGroupsRebalancing
> > > > > > > >> > > > > > > > > - **NumGroupsAwaitingSync`* should cover the
> > > above
> > > > > > > three,
> > > > > > > >> where
> > > > > > > >> > > > > "Stable"
> > > > > > > >> > > > > > > > > should be contributing most of the counts:
> If we
> > > > > have
> > > > > > a
> > > > > > > >> bug
> > > > > > > >> > > that
> > > > > > > >> > > > > causes
> > > > > > > >> > > > > > > > the
> > > > > > > >> > > > > > > > > num.Dead / Empty to keep increasing, then we
> > > would
> > > > > > > observe
> > > > > > > >> > > > > `NumGroups`
> > > > > > > >> > > > > > > > keep
> > > > > > > >> > > > > > > > > increasing which should be sufficient for
> > > alerting.
> > > > > > And
> > > > > > > >> trouble
> > > > > > > >> > > > > shooting
> > > > > > > >> > > > > > > > of
> > > > > > > >> > > > > > > > > the issue could be relying on the log4j.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > *Guozhang*
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael
> Juma <
> > > > > > > >> > > ism...@juma.me.uk>
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > > Thanks for the KIP, Colin. This will
> > > definitely be
> > > > > > > >> useful.
> > > > > > > >> > > One
> > > > > > > >> > > > > > > > question:
> > > > > > > >> > > > > > > > > > would it be useful to have a metric for
> for
> > > the
> > > > > > number
> > > > > > > >> of
> > > > > > > >> > > > groups
> > > > > > > >> > > > > in
> > > > > > > >> > > > > > > > each
> > > > > > > >> > > > > > > > > > possible state? The KIP suggests
> > > > > > "PreparingRebalance"
> > > > > > > >> and
> > > > > > > >> > > > > > > > "AwaitingSync".
> > > > > > > >> > > > > > > > > > That leaves "Stable", "Dead" and "Empty".
> Are
> > > > > those
> > > > > > > not
> > > > > > > >> > > useful?
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > Ismael
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin
> McCabe
> > > <
> > > > > > > >> > > > > cmcc...@apache.org>
> > > > > > > >> > > > > > > > > wrote:
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > > Hi all,
> > > > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > > > > > > > I posted "KIP-180: Add a broker metric
> > > > > specifying
> > > > > > > the
> > > > > > > >> > > number
> > > > > > > >> > > > of
> > > > > > > >> > > > > > > > > consumer
> > > > > > > >> > > > > > > > > > > group rebalances in progress" for
> > > discussion:
> > > > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confl
> > > > > > > >> uence/display/KAFKA/KIP-
> > > > > > > >> > > > > > > > > > > 180%3A+Add+a+broker+metric+
> > > > > > > specifying+the+number+of+
> > > > > > > >> > > > > > > > > > > consumer+group+rebalances+in+progress
> > > > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > > > > > > > Check it out.
> > > > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > > > > > > > cheers,
> > > > > > > >> > > > > > > > > > > Colin
> > > > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > --
> > > > > > > >> > > > > > > > > -- Guozhang
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > --
> > > > > > > >> > > > > > > -- Guozhang
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > --
> > > > > > > >> > > > -- Guozhang
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > --
> > > > > > > >> > -- Guozhang
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > >
>

Reply via email to