Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-31 Thread Rajini Sivaram
Hi Jun, Thank you, that makes sense. Updated the KIP. Regards, Rajini On Thu, Aug 31, 2017 at 12:35 PM, Jun Rao wrote: > Hi, Rajini, > > Thanks for the updated KIP. Should be also track the conversion time for > the producer? If so, the name can just be MessageConversionTimeMs and only > the

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-31 Thread Jun Rao
Hi, Rajini, Thanks for the updated KIP. Should be also track the conversion time for the producer? If so, the name can just be MessageConversionTimeMs and only the produce and the fetch request may have such a component. Jun On Wed, Aug 30, 2017 at 1:37 PM, Rajini Sivaram wrote: > Jun, > > Tha

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-31 Thread Manikumar
Hi Rajini, Yes, It provides the failure rate for all errors. I got confused with old metric names. Thank you for the clarification. Thanks On Thu, Aug 31, 2017 at 6:34 PM, Rajini Sivaram wrote: > Hi Manikumar, > > We currently have topic-level metrics for FailedProduceRequestsPerSec, > Failed

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-31 Thread Rajini Sivaram
Hi Roger, Yes, the rate metrics `successful-authentication-rate` and ` failed-authentication-rate` will also have corresponding failed-authentication-count and successful-authentication-count to match KIP 187. Thank you, Rajini On Tue, Aug 29, 2017 at 1:07 PM, Roger Hoover wrote: > Great sugg

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-31 Thread Rajini Sivaram
Hi Manikumar, We currently have topic-level metrics for FailedProduceRequestsPerSec, FailedFetchRequestsPerSec. For all requests including produce/fetch, this KIP adds a new error rate metric: MBean: kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=api_key_name,error=error_code_name F

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-31 Thread Manikumar
Hi, Currently, we have FailedProduceRequestsPerSec, FailedFetchRequestsPerSec metrics to indicate un-expected failures on a broker while handling producer/fetch requests. Will it be useful to add these metrics for other requests also? I don't want to include these metrics to this KIP. Just want to

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-30 Thread Rajini Sivaram
Jun, Thank you, your suggestions sound good. I have updated the KIP. Regards, Rajini On Tue, Aug 29, 2017 at 9:12 PM, Jun Rao wrote: > Hi, Rajini, > > Thanks for the updated KIP. I agree that those additional metrics can be > useful. I was thinking what would an admin do if the value of one o

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-29 Thread Jun Rao
Hi, Rajini, Thanks for the updated KIP. I agree that those additional metrics can be useful. I was thinking what would an admin do if the value of one of those metrics is abnormal. An admin probably want to determine which client causes the abnormaly. So, a couple of more comments below. 10. Abo

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-29 Thread Roger Hoover
Great suggestions, Ismael and thanks for incorporating them, Rajini. Tracking authentication success and failures (#3) across listeners seems very useful for cluster administrators to identify misconfigured client or bad actors, especially until all clients implement KIP-152 which will add an expl

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-29 Thread Rajini Sivaram
Hi Ismael, Thank you for the suggestions. The additional metrics sound very useful and I have added them to the KIP. Regards, Rajini On Tue, Aug 29, 2017 at 5:34 AM, Ismael Juma wrote: > Hi Rajini, > > There are a few other metrics that could potentially be useful. I'd be > interested in what

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-29 Thread Ismael Juma
Hi Rajini, There are a few other metrics that could potentially be useful. I'd be interested in what you and the community thinks: 1. The KIP currently includes `FetchDownConversionsPerSec`, which is useful. In the common case, one would want to avoid down conversion by using the lower message fo

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-22 Thread Jun Rao
Hi, Rajini, Yes, if those error metrics are registered dynamically, we could worry about expiration later. Thanks, Jun On Fri, Aug 18, 2017 at 1:55 AM, Rajini Sivaram wrote: > Perhaps we could register dynamically for now. If we find that the cost of > retaining these is high, we can add the

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-22 Thread Rajini Sivaram
Hi all, We can discuss the expiry implementation as part of the PR review. If there are no further comments, I will start vote on this KIP tomorrow. Please let me know if there are any other concerns. On Fri, Aug 18, 2017 at 4:55 AM, Rajini Sivaram wrote: > Perhaps we could register dynamically

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-18 Thread Rajini Sivaram
Perhaps we could register dynamically for now. If we find that the cost of retaining these is high, we can add the code to expire them later. Is that ok? Regards, Rajini On Fri, Aug 18, 2017 at 9:41 AM, Ismael Juma wrote: > Can we quantify the cost of having these metrics around if they are

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-18 Thread Ismael Juma
Can we quantify the cost of having these metrics around if they are dynamically registered? Given that the maximum is bounded at development time, is it really worth all the extra code? Ismael On Fri, Aug 18, 2017 at 9:34 AM, Rajini Sivaram wrote: > Jun, > > It feels more consistent to add erro

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-18 Thread Rajini Sivaram
Jun, It feels more consistent to add errors as yammer metrics similar to other request metrics. Perhaps we could add some code to track and remove these if unused? It is a bit more work, but it would keep the externals consistent. Ismael/Manikumar, Agree that version as a String attribute makes

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-17 Thread Manikumar
I agree it will be good if we can add "commit id/version" as an attribute value. It will be easy to parse. But as of now, KafkaMetric supports only numerical values. On Fri, Aug 18, 2017 at 5:49 AM, Ismael Juma wrote: > Hi Rajini, > > About the gauges, I was thinking that the attribute would be

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-17 Thread Ismael Juma
Hi Rajini, About the gauges, I was thinking that the attribute would be the value (i.e. commit id or version). I understand that Kafka Metrics doesn't support this (unlike Yammer Metrics), but would it make sense to add? Ismael On Thu, Aug 17, 2017 at 2:54 PM, Rajini Sivaram wrote: > Hi Ismael

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-17 Thread Jun Rao
Hi, Rajini, 1. Yammer doesn't seem to support expiration. I guess we can use the Kafka metric in this case. Thanks, Jun On Thu, Aug 17, 2017 at 4:53 PM, Roger Hoover wrote: > Rajini, > > The table is super helpful. Thank you. > > On Thu, Aug 17, 2017 at 2:16 AM, Rajini Sivaram > wrote: > >

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-17 Thread Roger Hoover
Rajini, The table is super helpful. Thank you. On Thu, Aug 17, 2017 at 2:16 AM, Rajini Sivaram wrote: > Hi Roger, > > Thank you for the review. I have added a table with the scope of errors > counted for each request. > > Regards, > > Rajini > > On Thu, Aug 17, 2017 at 12:05 AM, Roger Hoover

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-17 Thread Rajini Sivaram
Hi Ismael, Thank you for the review. 1. Agree on keeping it simple with dynamic registration and no expiry. Will wait for Jun's feedback before updating KIP. 2. I have switched to two metrics for commit-id and version (not sure if it matches what you meant). I also added the client-id tag which i

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-17 Thread Ismael Juma
Thanks for the KIP, Rajini. I think this is helpful too. A few minor comments. 1. About the number of metrics and expiration, if we dynamically register metrics for the error codes, the number is likely to be much lower than 30*30, probably less than 100. If we were using Kafka Metrics for this, w

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-17 Thread Rajini Sivaram
Hi Jun, Thank you for the review. 1. Makes sense. I have updated the KIP. 2. Moved to a new group ZooKeeperClient 3. It is a gauge, so it will have a single attribute called Value with a constant value of 1. Regards, Rajini On Thu, Aug 17, 2017 at 3:16 AM, Jun Rao wrote: > Hi, Rajini, > > Th

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-17 Thread Rajini Sivaram
Hi Roger, Thank you for the review. I have added a table with the scope of errors counted for each request. Regards, Rajini On Thu, Aug 17, 2017 at 12:05 AM, Roger Hoover wrote: > I think it would useful to make clear somewhere for each metric, the level > at which it's counted. I don't know

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-16 Thread Jun Rao
Hi, Rajini, Thanks for the KIP. A few comments. 1. We have 30+ requests and 30+ error code and growing. So, the combination can be large. Perhaps it's useful to expire an error metric if it's no longer updated after some time? We did something similar for the quota metric. 2. It's a bit weird to

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-16 Thread Roger Hoover
I think it would useful to make clear somewhere for each metric, the level at which it's counted. I don't know all the details of the Kafka protocol but it might be something like ProduceRequest, Fetch Request - counted at per-partition level All other requests are 1:1 with client requests? Chee

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-16 Thread Roger Hoover
Rajini, Thank you for the KIP. These are very helpful additions. One question on the error code metrics: Will the total error counting happen at the the level of topic partition? For example, if a single ProduceRequest contains messages to append to 3 partitions and say all 3 appends are succes

[DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-16 Thread Rajini Sivaram
I have created a KIP to add some additional metrics to support health checks: https://cwiki.apache.org/confluence/display/KAFKA/KIP-188+-+Add+new+metrics+to+support+health+checks Feedback and suggestions are welcome. Regards, Rajini