My two cents: "Dead" and "Empty" states are transient: groups usually only leaves in this state for a short while and then being deleted or transited to other states.
Since we have the existing "*NumGroups*" metric, `*NumGroups - **NumGroupsRebalancing - **NumGroupsAwaitingSync`* should cover the above three, where "Stable" should be contributing most of the counts: If we have a bug that causes the num.Dead / Empty to keep increasing, then we would observe `NumGroups` keep increasing which should be sufficient for alerting. And trouble shooting of the issue could be relying on the log4j. *Guozhang* On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma <ism...@juma.me.uk> wrote: > Thanks for the KIP, Colin. This will definitely be useful. One question: > would it be useful to have a metric for for the number of groups in each > possible state? The KIP suggests "PreparingRebalance" and "AwaitingSync". > That leaves "Stable", "Dead" and "Empty". Are those not useful? > > Ismael > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe <cmcc...@apache.org> wrote: > > > Hi all, > > > > I posted "KIP-180: Add a broker metric specifying the number of consumer > > group rebalances in progress" for discussion: > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > 180%3A+Add+a+broker+metric+specifying+the+number+of+ > > consumer+group+rebalances+in+progress > > > > Check it out. > > > > cheers, > > Colin > > > -- -- Guozhang