Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

Rajini Sivaram Thu, 17 Aug 2017 06:54:46 -0700

Hi Ismael,

Thank you for the review.


1. Agree on keeping it simple with dynamic registration and no expiry. Will
wait for Jun's feedback before updating KIP.
2. I have switched to two metrics for commit-id and version (not sure if it
matches what you meant). I also added the client-id tag which is used in
all metrics from clients.

Regards,

Rajini


On Thu, Aug 17, 2017 at 10:55 AM, Ismael Juma <[email protected]> wrote:

> Thanks for the KIP, Rajini. I think this is helpful too. A few minor
> comments.
>
> 1. About the number of metrics and expiration, if we dynamically register
> metrics for the error codes, the number is likely to be much lower than
> 30*30, probably less than 100. If we were using Kafka Metrics for this, we
> could easily add a long expiration period to be conservative, but I am not
> sure this is supported by Yammer Metrics. If it is not, there's an argument
> for keeping it simple.
>
> 2. Would it make sense to use 2 gauges for the version and commit id? It
> seems more intuitive than having those values as tags.
>
> Ismael
>
> On Thu, Aug 17, 2017 at 10:19 AM, Rajini Sivaram <[email protected]>
> wrote:
>
> > Hi Jun,
> >
> > Thank you for the review.
> >
> > 1. Makes sense. I have updated the KIP.
> > 2. Moved to a new group ZooKeeperClient
> > 3. It is a gauge, so it will have a single attribute called Value with a
> > constant value of 1.
> >
> > Regards,
> >
> > Rajini
> >
> > On Thu, Aug 17, 2017 at 3:16 AM, Jun Rao <[email protected]> wrote:
> >
> > > Hi, Rajini,
> > >
> > > Thanks for the KIP. A few comments.
> > >
> > > 1. We have 30+ requests and 30+ error code and growing. So, the
> > combination
> > > can be large. Perhaps it's useful to expire an error metric if it's no
> > > longer updated after some time? We did something similar for the quota
> > > metric.
> > >
> > > 2. It's a bit weird to put the ZK latency metric under
> > > type=SessionExpireListener.
> > > Perhaps it's more intuitive to put it in a separate type.
> > >
> > > 3. For the client version metric, since we representing commit_id and
> > > version as tags in the metric name. So the mbean will have no
> attributes?
> > >
> > > Jun
> > >
> > >
> > >
> > > On Wed, Aug 16, 2017 at 4:05 PM, Roger Hoover <[email protected]>
> > > wrote:
> > >
> > > > I think it would useful to make clear somewhere for each metric, the
> > > level
> > > > at which it's counted.  I don't know all the details of the Kafka
> > > protocol
> > > > but it might be something like
> > > >
> > > > ProduceRequest, Fetch Request - counted at per-partition level
> > > > All other requests are 1:1 with client requests?
> > > >
> > > > Cheers,
> > > >
> > > > Roger
> > > >
> > > > On Wed, Aug 16, 2017 at 4:02 PM, Roger Hoover <
> [email protected]>
> > > > wrote:
> > > >
> > > > > Rajini,
> > > > >
> > > > > Thank you for the KIP.  These are very helpful additions.  One
> > question
> > > > on
> > > > > the error code metrics:
> > > > >
> > > > > Will the total error counting happen at the the level of topic
> > > partition?
> > > > > For example, if a single ProduceRequest contains messages to append
> > to
> > > 3
> > > > > partitions and say all 3 appends are successful, the counter
> > > > > for kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=
> > > > ProduceRequest,error=0
> > > > > will be incremented by 3?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Roger
> > > > >
> > > > > On Wed, Aug 16, 2017 at 12:10 PM, Rajini Sivaram <
> > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > >> I have created a KIP to add some additional metrics to support
> > health
> > > > >> checks:
> > > > >>
> > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-188+-+
> > > > >> Add+new+metrics+to+support+health+checks
> > > > >>
> > > > >> Feedback and suggestions are welcome.
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >> Rajini
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

Reply via email to