Joel, thanks for the clarifications.

maybe I misunderstand the intent of that metric. Yes, we are looking for
alerting if some partitions aren't owned by any consumer from the group
(just in case this ever happens).

yes, MaxLag mbeans only apply for partitions owned by a consumer.

looking forward to the new consumer :)

On Tue, Jan 27, 2015 at 8:55 PM, Joel Koshy <jjkosh...@gmail.com> wrote:

> I'm not sure if I'm misunderstanding the suggestion, but this metric
> was ever intended for alerts. Some metrics are more for informational
> purposes than for setting up alerts. In fact it is possible for some
> consumers to have zero owned partitions if there are fewer partitions
> than consumers in the group.
>
> I think you are looking for some mechanism to determine if a
> particular partition has not been owned by an instance in the group.
> I think it is a bit difficult to do that directly in the current high
> level consumer. Instead, you can monitor the consumer lag using the
> consumer offset checker - which is not ideal since it is not
> integrated in the consumer. The consumer does have lag mbeans but
> those apply only for partitions that are owned. This concern can be
> addressed with the new consumer.
>
> On Tue, Jan 27, 2015 at 03:20:55PM -0800, Steven Wu wrote:
> > To illustrate my point, I will use "allTopicsOwnedPartitionsCount" guage
> > from  ZookeeperConsumerConnector as an example. It captures number of
> > partitions for a topic that has been assigned owner for the consumer
> group.
> > let's say that I have a topic with 9 partitions. this metrics should
> > normally report value 9. I can setup alert
> > if allTopicsOwnedPartitionsCount <9.
> >
> > here are the drawbacks of this kind of metric.
> > 1) if our metrics report/aggregation system has data loss and cause the
> > value reported as zero, we can't really distinguish whether it's an real
> > error or it is data loss. so we can get false positive/alarm from data
> loss
> > 2) if we change the number of partitions (e.g. from 9 to 18). we need to
> > remember to change the alert rule to "allTopicsOwnedPartitionsCount <18".
> > this kind of coupling is a maintenance nightmare.
> >
> > A more explicit metric is "NoOwnerPartitionsCount". it should be zero
> > normally. if it is not zero, we should be alerted. this way, we won't get
> > false alarm from data loss.
> >
> > We don't have to change/fix this particular example since a new consumer
> is
> > being worked on. But in new consumer please consider more explicit error
> > signals.
> >
> > Thanks,
> > Steven
>
>

Reply via email to