Thanks for the KIP, Rajini. I think this is helpful too. A few minor comments.
1. About the number of metrics and expiration, if we dynamically register metrics for the error codes, the number is likely to be much lower than 30*30, probably less than 100. If we were using Kafka Metrics for this, we could easily add a long expiration period to be conservative, but I am not sure this is supported by Yammer Metrics. If it is not, there's an argument for keeping it simple. 2. Would it make sense to use 2 gauges for the version and commit id? It seems more intuitive than having those values as tags. Ismael On Thu, Aug 17, 2017 at 10:19 AM, Rajini Sivaram <rajinisiva...@gmail.com> wrote: > Hi Jun, > > Thank you for the review. > > 1. Makes sense. I have updated the KIP. > 2. Moved to a new group ZooKeeperClient > 3. It is a gauge, so it will have a single attribute called Value with a > constant value of 1. > > Regards, > > Rajini > > On Thu, Aug 17, 2017 at 3:16 AM, Jun Rao <j...@confluent.io> wrote: > > > Hi, Rajini, > > > > Thanks for the KIP. A few comments. > > > > 1. We have 30+ requests and 30+ error code and growing. So, the > combination > > can be large. Perhaps it's useful to expire an error metric if it's no > > longer updated after some time? We did something similar for the quota > > metric. > > > > 2. It's a bit weird to put the ZK latency metric under > > type=SessionExpireListener. > > Perhaps it's more intuitive to put it in a separate type. > > > > 3. For the client version metric, since we representing commit_id and > > version as tags in the metric name. So the mbean will have no attributes? > > > > Jun > > > > > > > > On Wed, Aug 16, 2017 at 4:05 PM, Roger Hoover <roger.hoo...@gmail.com> > > wrote: > > > > > I think it would useful to make clear somewhere for each metric, the > > level > > > at which it's counted. I don't know all the details of the Kafka > > protocol > > > but it might be something like > > > > > > ProduceRequest, Fetch Request - counted at per-partition level > > > All other requests are 1:1 with client requests? > > > > > > Cheers, > > > > > > Roger > > > > > > On Wed, Aug 16, 2017 at 4:02 PM, Roger Hoover <roger.hoo...@gmail.com> > > > wrote: > > > > > > > Rajini, > > > > > > > > Thank you for the KIP. These are very helpful additions. One > question > > > on > > > > the error code metrics: > > > > > > > > Will the total error counting happen at the the level of topic > > partition? > > > > For example, if a single ProduceRequest contains messages to append > to > > 3 > > > > partitions and say all 3 appends are successful, the counter > > > > for kafka.network:type=RequestMetrics,name=ErrorsPerSec,request= > > > ProduceRequest,error=0 > > > > will be incremented by 3? > > > > > > > > Thanks, > > > > > > > > Roger > > > > > > > > On Wed, Aug 16, 2017 at 12:10 PM, Rajini Sivaram < > > > rajinisiva...@gmail.com> > > > > wrote: > > > > > > > >> I have created a KIP to add some additional metrics to support > health > > > >> checks: > > > >> > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-188+-+ > > > >> Add+new+metrics+to+support+health+checks > > > >> > > > >> Feedback and suggestions are welcome. > > > >> > > > >> Regards, > > > >> > > > >> Rajini > > > >> > > > > > > > > > > > > > >